- Jan 10, 2018
-
-
Chris Hines authored
Make slurmctld and slurmdbd explicitily disabled on startup (rather than just leaving it to whatever it is currently set to)
-
- Jan 09, 2018
-
-
Gin Tan (Monash University) authored
-
- Nov 27, 2017
-
-
Jafar Lie authored
-
- Nov 24, 2017
-
-
Jafar Lie authored
-
- Nov 13, 2017
-
-
Jafar Lie authored
-
- Aug 30, 2017
-
-
Kerri Wait authored
-
- Aug 11, 2017
-
-
Chris Hines authored
finally corrected the cuda collectd script See merge request !140
-
Chris Hines authored
-
Chris Hines authored
-
Lance Wilson authored
Remove unused fan_speed monitoring, add monitoring of memory and gpu utilization See merge request !139
-
Chris Hines authored
-
- Aug 08, 2017
-
-
Chris Hines authored
-
- Aug 07, 2017
-
-
Chris Hines authored
Gluster See merge request !135
-
-
changed a role to run on "+ delegate_to: "{{ gluster_servers[0] }}"" so it is consistent within playbook added a ping value and restarted the daemon. This was to try to fix startup bugs we see in Monarch.
-
Chris Hines authored
collectd monitoring changes ... monitor additional things in proc, don't monitor slabinfo as its noisy, monitor gpus
- Jul 31, 2017
-
-
Chris Hines authored
nagios script for slurmctld See merge request !132
-
Chris Hines authored
-
- Jul 13, 2017
-
-
Chris Hines authored
Dropping cache on epilog & prolog See merge request !130
-
Chris Hines authored
-
- Jul 11, 2017
-
-
Gin Tan (Monash University) authored
-
Gin Tan (Monash University) authored
-
Gin Tan (Monash University) authored
-
- Jul 10, 2017
-
-
Gin Tan (Monash University) authored
-
- Jul 05, 2017
-
-
Gin Tan (Monash University) authored
- Jun 19, 2017
-
-
Chris Hines authored
add a collectd role that can be deploed to compute and login nodes See merge request !128
-
Chris Hines authored
-
- May 26, 2017
-
-
Chris Hines authored
update the hpcsystems role to include the naggy_quota cronjob and update the tools via pip See merge request !127
-
Chris Hines authored
fix the volcreate to be more sane See merge request !126
-
Chris Hines authored
-
Chris Hines authored
-
- May 18, 2017
-
-
Chris Hines authored
Added new role to enable Slurm DB SQL to be backed up and then ssh to a node. See merge request !125
-
- May 15, 2017
-
-
Simon Michnowicz (Monash University) authored
-
- May 11, 2017
-
-
Simon Michnowicz (Monash University) authored
Location of node, backup dir, and dummy user account are contained in defaults/main.yml Both SQL nodes and Management node need to have this role applied, with 'server' parameter determining the different
-
- Apr 21, 2017
-
-
Chris Hines authored
add a flag to nvidia-xconfig because it is not inputing busids for the m3f class nodes See merge request !124
-
Chris Hines authored
-
- Mar 30, 2017
-
-
Chris Hines authored
Update provision_slurm.py.j2 to import defaultdict to silence this error See merge request !122
-
- Mar 27, 2017
-
-
Kerri Wait authored
Traceback (most recent call last): File "/usr/local/sbin/provision_slurm.py", line 100, in <module> mk_slurmuser_batch(usergroup,"default") File "/usr/local/sbin/provision_slurm.py", line 67, in mk_slurmuser_batch userdict = defaultdict(list) NameError: global name 'defaultdict' is not defined
-