HPCasCode merge requestshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests2024-02-06T13:57:42+11:00https://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/601moved module init script names to run last2024-02-06T13:57:42+11:00Simon Michnowiczmoved module init script names to run lastdelete old filedelete old filePhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/600updated default mysql parameters for slurmdbd2024-01-17T15:20:37+11:00Simon Michnowiczupdated default mysql parameters for slurmdbdSee https://slurm.schedmd.com/accounting.htmlSee https://slurm.schedmd.com/accounting.htmlPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/599removed fields not needed for latest slurm2024-02-06T13:58:17+11:00Simon Michnowiczremoved fields not needed for latest slurmPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/598It should not be necessary to drop vfs caches before/after each job and in...2023-12-13T13:55:08+11:00Chris HinesIt should not be necessary to drop vfs caches before/after each job and in...It should not be necessary to drop vfs caches before/after each job and in most cases of a shared node it will be harmfulIt should not be necessary to drop vfs caches before/after each job and in most cases of a shared node it will be harmfulPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/597Rocky update2023-10-26T16:27:28+11:00Simon MichnowiczRocky updateupdate slurm build to include rocky dependenciesupdate slurm build to include rocky dependenciesPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/596we attempt to generate a unique file name for every posible combinations of...2023-10-24T12:36:48+11:00Chris Hineswe attempt to generate a unique file name for every posible combinations of...we attempt to generate a unique file name for every posible combinations of GPUs. The old code worked fine for <10 gpus but failed hard for 16. Deploying this should resolve https://monasheresearch.freshdesk.com/a/tickets/60510we attempt to generate a unique file name for every posible combinations of GPUs. The old code worked fine for <10 gpus but failed hard for 16. Deploying this should resolve https://monasheresearch.freshdesk.com/a/tickets/60510Simon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/595Slurm upgrade and new rocky sql server2023-10-11T16:47:57+11:00Simon MichnowiczSlurm upgrade and new rocky sql server- added Rocky install script for myqsl
- (independant of slurm upgrade)- added Rocky install script for myqsl
- (independant of slurm upgrade)Philip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/594Fixmailchimp2023-08-23T16:22:17+10:00Simon MichnowiczFixmailchimpfixed a typo in the format commandfixed a typo in the format commandSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/593Allow ec2-user to pamd ssh allow user list. This will enable ec2-user to be...2023-07-07T12:14:29+10:00Swe AungAllow ec2-user to pamd ssh allow user list. This will enable ec2-user to be...Allow ec2-user to pamd ssh allow user list. This will enable ec2-user to be able to login to the sytems even without slurmd process is startedAllow ec2-user to pamd ssh allow user list. This will enable ec2-user to be able to login to the sytems even without slurmd process is startedhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/592tweaks to the reboot parameters: I suspect ansible is struggling when the...2023-01-17T14:31:47+11:00Chris Hinestweaks to the reboot parameters: I suspect ansible is struggling when the...tweaks to the reboot parameters: I suspect ansible is struggling when the bastion node is rebooted and it tries to connect to the other nodestweaks to the reboot parameters: I suspect ansible is struggling when the bastion node is rebooted and it tries to connect to the other nodesPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/591cron was spewing errors because of an unescaled % (modulo operation2022-12-19T12:19:57+11:00Chris Hinescron was spewing errors because of an unescaled % (modulo operationPhilip Chanphilip.chan@monash.eduPhilip Chanphilip.chan@monash.eduhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/590Update roles/telegraf/templates/telegraf.conf.j22022-11-07T15:27:04+11:00Kerri WaitUpdate roles/telegraf/templates/telegraf.conf.j2@tngu0050 can you please merge + run this role on M3? it's to update the telegraf config to get around an ssl issue@tngu0050 can you please merge + run this role on M3? it's to update the telegraf config to get around an ssl issueTrung NguyenTrung Nguyenhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/589symlinker should be run once per play to avoid starting slurm without the sym...2022-10-14T13:43:31+11:00Andreas Hamachersymlinker should be run once per play to avoid starting slurm without the symlinks being generatedAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/588post fs04 migration maintenance checking.2022-10-14T09:55:24+11:00Andreas Hamacherpost fs04 migration maintenance checking.since this reflects the state of the cluster there is little need for a QA reviewsince this reflects the state of the cluster there is little need for a QA reviewAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/587adding a script to catch a generic nvidia-smi error2022-10-19T13:00:23+11:00Andreas Hamacheradding a script to catch a generic nvidia-smi errorthis should catch the issue on the m3a nodes and similar in the future
tested on m3p006 ( good case ) and m3a101 ( bad case )this should catch the issue on the m3a nodes and similar in the future
tested on m3p006 ( good case ) and m3a101 ( bad case )Simon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/586do not restart slurmd if it is not intended to2022-10-03T14:21:11+11:00Andreas Hamacherdo not restart slurmd if it is not intended toAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/585Decom consistency1 fixing epel dependancies2022-09-26T11:05:02+10:00Andreas HamacherDecom consistency1 fixing epel dependanciesAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/584Decom consistency02022-09-26T10:56:30+10:00Andreas HamacherDecom consistency0Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/583Unattended upgrades role2022-09-28T10:02:20+10:00Andreas HamacherUnattended upgrades roledeploys a daily cronjob to update packages.
populates a versionlock for packages using either apt_preferences role or yum versionlock plugin
fully untested ! the purpose of this draft is to have a baseline for discussion
upgrade rol...deploys a daily cronjob to update packages.
populates a versionlock for packages using either apt_preferences role or yum versionlock plugin
fully untested ! the purpose of this draft is to have a baseline for discussion
upgrade role can be further simplified
upgrade role currently wipes the settings of unnatended-upgrade. We could rerun unattended upgrades using the same tag or try a different mechanism.Geoff DuniamGeoff Duniamhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/582Spankfix422022-09-28T08:58:05+10:00Andreas HamacherSpankfix42