HPCasCode merge requestshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests2022-05-23T15:56:21+10:00https://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/561adding role for agent based tenable scans2022-05-23T15:56:21+10:00Andreas Hamacheradding role for agent based tenable scanstested on ubuntu 20tested on ubuntu 20Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/560making cgroup clusterspecific and re-enabling spank plugin2022-05-06T15:44:08+10:00Andreas Hamachermaking cgroup clusterspecific and re-enabling spank plugincgroup.conf cannot be templated centrally anymore because mig devices work differently e.g. for mlerp and gerpcgroup.conf cannot be templated centrally anymore because mig devices work differently e.g. for mlerp and gerphttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/559Updated SLURM version and NVIDIA public key2022-05-05T15:08:34+10:00Jay Van SchyndelUpdated SLURM version and NVIDIA public keyJay Van SchyndelJay Van Schyndelhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/558cgroup.conf cannot be templated centrally anymore because mig devices work...2022-06-02T15:44:19+10:00Andreas Hamachercgroup.conf cannot be templated centrally anymore because mig devices work...cgroup.conf cannot be templated centrally anymore because mig devices work differently e.g. for mlerp and gerpcgroup.conf cannot be templated centrally anymore because mig devices work differently e.g. for mlerp and gerpChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/557Improve mysqldump command2022-05-05T13:41:41+10:00Simon MichnowiczImprove mysqldump commandadded --single-transaction to mysqldump backup command
please note that this flag is position dependant. You can not put it before --defaults-file
This has been tested on M3 and Monarch clusters.added --single-transaction to mysqldump backup command
please note that this flag is position dependant. You can not put it before --defaults-file
This has been tested on M3 and Monarch clusters.Trung NguyenTrung Nguyenhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/556removing deprecated flag TaskAffinity=yes for slurm212022-05-05T12:41:34+10:00Andreas Hamacherremoving deprecated flag TaskAffinity=yes for slurm21Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/555managementnodes also need a bigger tmpfs2022-04-21T16:37:03+10:00Andreas Hamachermanagementnodes also need a bigger tmpfsSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/554applying MIG via systemd to make it reboot persistent2022-04-26T15:55:06+10:00Andreas Hamacherapplying MIG via systemd to make it reboot persistentnvidia does some very strange deployments here....
systemd is calling an nvidia script which expects a config file in a certain place and uses a local environmental variable to parameterize the systemd thing
who am I to judge :smile:
...nvidia does some very strange deployments here....
systemd is calling an nvidia script which expects a config file in a certain place and uses a local environmental variable to parameterize the systemd thing
who am I to judge :smile:
This change is not calling the parted binary directly but installs the systemd service and enables/starts it to gain persistency....
tested on qcif-node01Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/553Missing quotes.2022-04-07T12:01:00+10:00Jay Van SchyndelMissing quotes.Oops, missing quotes.Oops, missing quotes.Jay Van SchyndelJay Van Schyndelhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/552Jay2022-04-07T11:12:21+10:00Jay Van SchyndelJayAdded variables for user defined Cuda and libcudnn versions.Added variables for user defined Cuda and libcudnn versions.Jay Van SchyndelJay Van Schyndelhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/551Mpiuplift2022-04-14T15:37:18+10:00Andreas HamacherMpiupliftSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/550Added checks for an NVIDIA GPU, mig compatible GPU.2022-03-15T13:31:20+11:00Jay Van SchyndelAdded checks for an NVIDIA GPU, mig compatible GPU.Only runs when an NVIDIA GPU is detected, mig is only setup/configured when that GPU supports MIG.Only runs when an NVIDIA GPU is detected, mig is only setup/configured when that GPU supports MIG.Jay Van SchyndelJay Van Schyndelhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/549remove most nhc checks as its difficult to link the number of CPUs the...2022-03-02T12:20:35+11:00Chris Hinesremove most nhc checks as its difficult to link the number of CPUs the...remove most nhc checks as its difficult to link the number of CPUs the instance will be created with to the nhc fileremove most nhc checks as its difficult to link the number of CPUs the instance will be created with to the nhc fileAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/548adding role for x11 forwarding2022-02-03T19:03:54+11:00Andreas Hamacheradding role for x11 forwardingto be used for monarch_loginsto be used for monarch_loginsSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/547Buildslurmwithpmi2022-04-27T14:28:55+10:00Andreas HamacherBuildslurmwithpmiwe have nodes without pmi which is impacting some jobs
https://monasheresearch.freshdesk.com/a/tickets/28868
code tested on massive004. It only works on new slurm installs.
to fix the nodes already in the queue run
```
cd /opt/src/sl...we have nodes without pmi which is impacting some jobs
https://monasheresearch.freshdesk.com/a/tickets/28868
code tested on massive004. It only works on new slurm installs.
to fix the nodes already in the queue run
```
cd /opt/src/slurm-20.02.7/contribs/pmi;sudo make install
cd /opt/src/slurm-20.02.7/contribs/pmi2;sudo make install
```Trung NguyenTrung Nguyenhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/546Made /var/lib/sssd 80 M instead of 40M2022-01-14T19:15:56+11:00Simon MichnowiczMade /var/lib/sssd 80 M instead of 40MAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/545adding ucx dependency2022-01-05T12:26:55+11:00Andreas Hamacheradding ucx dependencyself merging since this is a very safe "trivial" change and should not rot in MR landself merging since this is a very safe "trivial" change and should not rot in MR landAndreas HamacherAndreas Hamacherhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/544adding munge dependencies2021-12-14T14:56:09+11:00Andreas Hamacheradding munge dependenciesSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/543debian lustre moduls have different names2021-12-09T16:34:07+11:00Andreas Hamacherdebian lustre moduls have different nameslustre packages need to be removed if the kernel changes.
the kernel modules need to be rebuild, but AFTER the mellanox module is build and I don't know how to configure an order inn dkmslustre packages need to be removed if the kernel changes.
the kernel modules need to be rebuild, but AFTER the mellanox module is build and I don't know how to configure an order inn dkmsSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/542fixing path for ubuntu2021-12-09T15:50:34+11:00Andreas Hamacherfixing path for ubuntuthis path exists on centos7 and ubuntu.
ubuntu currently keeps installing the driver everytime I run the rolethis path exists on centos7 and ubuntu.
ubuntu currently keeps installing the driver everytime I run the roleSimon MichnowiczSimon Michnowicz