HPCasCode merge requestshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests2020-05-15T20:48:22+10:00https://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/121changed zz_modulecmd to modulecmd, so the list of custom modules load properly2020-05-15T20:48:22+10:00Jafar Liechanged zz_modulecmd to modulecmd, so the list of custom modules load properlyChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/122Update provision_slurm.py.j2 to import defaultdict to silence this error2020-05-15T20:48:22+10:00Kerri WaitUpdate provision_slurm.py.j2 to import defaultdict to silence this error```python
Traceback (most recent call last):
File "/usr/local/sbin/provision_slurm.py", line 100, in <module>
mk_slurmuser_batch(usergroup,"default")
File "/usr/local/sbin/provision_slurm.py", line 67, in mk_slurmuser_batch
userd...```python
Traceback (most recent call last):
File "/usr/local/sbin/provision_slurm.py", line 100, in <module>
mk_slurmuser_batch(usergroup,"default")
File "/usr/local/sbin/provision_slurm.py", line 67, in mk_slurmuser_batch
userdict = defaultdict(list)
NameError: global name 'defaultdict' is not defined
```Chris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/123Update cron intervals for provisioning2020-05-15T20:46:28+10:00Kerri WaitUpdate cron intervals for provisioningChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/124add a flag to nvidia-xconfig because it is not inputing busids for the m3f cl...2020-05-15T20:48:22+10:00Chris Hinesadd a flag to nvidia-xconfig because it is not inputing busids for the m3f class nodeshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/125Added new role to enable Slurm DB SQL to be backed up and then ssh to a node.2020-05-15T20:48:23+10:00Simon MichnowiczAdded new role to enable Slurm DB SQL to be backed up and then ssh to a node.Location of node, backup dir, and dummy user account are contained in defaults/main.yml
Both SQL nodes and Management node need to have this role applied, with 'server' parameter determining the differentLocation of node, backup dir, and dummy user account are contained in defaults/main.yml
Both SQL nodes and Management node need to have this role applied, with 'server' parameter determining the differentChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/126fix the volcreate to be more sane2020-05-15T20:48:22+10:00Chris Hinesfix the volcreate to be more sanehttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/127update the hpcsystems role to include the naggy_quota cronjob and update the ...2020-05-15T20:48:23+10:00Chris Hinesupdate the hpcsystems role to include the naggy_quota cronjob and update the tools via piphttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/128add a collectd role that can be deploed to compute and login nodes2020-05-15T20:48:24+10:00Chris Hinesadd a collectd role that can be deploed to compute and login nodeshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/129Update the mofed version2020-05-15T20:48:24+10:00Gin TanUpdate the mofed versionChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/130Dropping cache on epilog & prolog2020-05-15T20:48:25+10:00Gin TanDropping cache on epilog & prologAdded
1. epilog & prolog to drop pagecache
2. clean up user files in /tmpAdded
1. epilog & prolog to drop pagecache
2. clean up user files in /tmpChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/131WIP: Nagios slurmctld2017-07-31T13:36:17+10:00Chris HinesWIP: Nagios slurmctldhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/132nagios script for slurmctld2020-05-15T20:48:25+10:00Chris Hinesnagios script for slurmctldhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/133Improve nat role2018-05-15T11:14:46+10:00Simon MichnowiczImprove nat roleSeveral changes made:
nat_server_role: feed in IP OF VLAN PRIVATE ADDRESS to we can work out name of eth device. Templates device names in iptables
Gluster_roles: add quorum command. Also modifies ping value in a conf file in an (unsucce...Several changes made:
nat_server_role: feed in IP OF VLAN PRIVATE ADDRESS to we can work out name of eth device. Templates device names in iptables
Gluster_roles: add quorum command. Also modifies ping value in a conf file in an (unsuccessful) attempt to solve issues on first build
Minor modes in sql role to improve comments and to remove issue with next version of ansible
removes default repos that get reinstalled after a yum updateChris HinesChris Hineshttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/134collectd monitoring changes ... monitor additional things in proc, don't moni...2020-05-15T20:49:14+10:00Chris Hinescollectd monitoring changes ... monitor additional things in proc, don't monitor…… slabinfo as its noisy, monitor gpus… slabinfo as its noisy, monitor gpusGin TanGin Tanhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/135Gluster2020-05-15T20:49:14+10:00Chris HinesGlusterhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/136change the way a script is called to determine number of GPUs2018-05-15T11:19:07+10:00Chris Hineschange the way a script is called to determine number of GPUsUnfortunately this new way also produced weird Python errors, which
could only be fixed by running as root (which is probably problem with original way)Unfortunately this new way also produced weird Python errors, which
could only be fixed by running as root (which is probably problem with original way)Simon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/137Nat role2020-01-28T15:09:26+11:00Chris HinesNat roleSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/138Ansible include errors2018-05-15T11:20:16+10:00Chris HinesAnsible include errorsSimon MichnowiczSimon Michnowiczhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/139Remove unused fan_speed monitoring, add monitoring of memory and gpu utilization2020-05-15T20:49:13+10:00Chris HinesRemove unused fan_speed monitoring, add monitoring of memory and gpu utilizationLance WilsonLance Wilsonhttps://gitlab.erc.monash.edu.au/hpc-team/HPCasCode/-/merge_requests/140finally corrected the cuda collectd script2020-05-15T20:48:37+10:00Chris Hinesfinally corrected the cuda collectd script