Newer
Older
MPI tests.
run a test on a single (updated or rebuilt) node, and two nodes.
run_two_node.sh --newnode=<node> --reservation=<res> [ --partition=<partition> --testnode=<testnode> ]
e.g. run_two_node.sh --newNode=gf00 --reservation=monMaintenance
e.g. run_two_node.sh --newNode=gf00 --reservation=monMaintenance --partition=gpu --testnode=ge00
Where
<node> is the name of the updated host we wish to test
<res> is the name of the SLURM reservation
<partition> is the name of the SLURM partition (defaults to comp)
<testnode> is a second (unupdated) node to do 2 server MPI (defaults to gf01)
This code assumes:
- slurm controller is up (So have a MPI environment to use)
- it uses srun to submit the job
- it assumes there is a SLURM reservation present which you must specify
- uses openmpi openmpi/1.10.7-mlx
- it uses a timer to kill the shell in case srun hangs. This timer still seems to work even when
we exit the script normally.i.e. it activates when script has left.