HPCasCode merge requestshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests2020-05-15T20:49:13+10:00https://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/139Remove unused fan_speed monitoring, add monitoring of memory and gpu utilization2020-05-15T20:49:13+10:00Chris HinesRemove unused fan_speed monitoring, add monitoring of memory and gpu utilizationLance WilsonLance Wilsonhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/135Gluster2020-05-15T20:49:14+10:00Chris HinesGlusterhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/134collectd monitoring changes ... monitor additional things in proc, don't moni...2020-05-15T20:49:14+10:00Chris Hinescollectd monitoring changes ... monitor additional things in proc, don't monitor…… slabinfo as its noisy, monitor gpus… slabinfo as its noisy, monitor gpusGin TanGin Tanhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/132nagios script for slurmctld2020-05-15T20:48:25+10:00Chris Hinesnagios script for slurmctldhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/131WIP: Nagios slurmctld2017-07-31T13:36:17+10:00Chris HinesWIP: Nagios slurmctldhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/130Dropping cache on epilog & prolog2020-05-15T20:48:25+10:00Gin TanDropping cache on epilog & prologAdded
1. epilog & prolog to drop pagecache
2. clean up user files in /tmpAdded
1. epilog & prolog to drop pagecache
2. clean up user files in /tmpChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/129Update the mofed version2020-05-15T20:48:24+10:00Gin TanUpdate the mofed versionChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/128add a collectd role that can be deploed to compute and login nodes2020-05-15T20:48:24+10:00Chris Hinesadd a collectd role that can be deploed to compute and login nodeshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/127update the hpcsystems role to include the naggy_quota cronjob and update the ...2020-05-15T20:48:23+10:00Chris Hinesupdate the hpcsystems role to include the naggy_quota cronjob and update the tools via piphttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/126fix the volcreate to be more sane2020-05-15T20:48:22+10:00Chris Hinesfix the volcreate to be more sanehttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/125Added new role to enable Slurm DB SQL to be backed up and then ssh to a node.2020-05-15T20:48:23+10:00Simon MichnowiczAdded new role to enable Slurm DB SQL to be backed up and then ssh to a node.Location of node, backup dir, and dummy user account are contained in defaults/main.yml
Both SQL nodes and Management node need to have this role applied, with 'server' parameter determining the differentLocation of node, backup dir, and dummy user account are contained in defaults/main.yml
Both SQL nodes and Management node need to have this role applied, with 'server' parameter determining the differentChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/124add a flag to nvidia-xconfig because it is not inputing busids for the m3f cl...2020-05-15T20:48:22+10:00Chris Hinesadd a flag to nvidia-xconfig because it is not inputing busids for the m3f class nodeshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/123Update cron intervals for provisioning2020-05-15T20:46:28+10:00Kerri WaitUpdate cron intervals for provisioningChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/122Update provision_slurm.py.j2 to import defaultdict to silence this error2020-05-15T20:48:22+10:00Kerri WaitUpdate provision_slurm.py.j2 to import defaultdict to silence this error```python
Traceback (most recent call last):
File "/usr/local/sbin/provision_slurm.py", line 100, in <module>
mk_slurmuser_batch(usergroup,"default")
File "/usr/local/sbin/provision_slurm.py", line 67, in mk_slurmuser_batch
userd...```python
Traceback (most recent call last):
File "/usr/local/sbin/provision_slurm.py", line 100, in <module>
mk_slurmuser_batch(usergroup,"default")
File "/usr/local/sbin/provision_slurm.py", line 67, in mk_slurmuser_batch
userdict = defaultdict(list)
NameError: global name 'defaultdict' is not defined
```Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/121changed zz_modulecmd to modulecmd, so the list of custom modules load properly2020-05-15T20:48:22+10:00Jafar Liechanged zz_modulecmd to modulecmd, so the list of custom modules load properlyChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/120Fix provision slurm2020-05-15T20:48:22+10:00Kerri WaitFix provision slurmFixing a logical error in provision_slurm.pyFixing a logical error in provision_slurm.pyChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/119tempalte xorg.conf when running on a node with 1GPU (this template works for ...2020-05-15T20:48:20+10:00Chris Hinestempalte xorg.conf when running on a node with 1GPU (this template works for M3 ……K1 nodes, will need to do something smarter for other clusters with GPUS)…K1 nodes, will need to do something smarter for other clusters with GPUS)https://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/118Ansible check2020-05-15T20:48:20+10:00Chris HinesAnsible checkhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/117missing pip from the list of dependencies2020-05-15T20:48:19+10:00Chris Hinesmissing pip from the list of dependencieshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/116start ntpd by default2020-05-15T20:48:19+10:00Chris Hinesstart ntpd by default