Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • hpc-team/HPCasCode
  • chines/ansible_cluster_in_a_box
2 results
Show changes
Showing
with 1202 additions and 813 deletions
nhc_version: 1.4.2
nhc_src_url: https://codeload.github.com/mej/nhc/tar.gz/refs/tags/1.4.2
nhc_src_checksum: "sha1:766762d2c8cd81204b92d4921fb5b66616351412"
nhc_src_dir: /opt/src/nhc-1.4.2
nhc_dir: /opt/nhc-1.4.2
slurm_version: 21.08.8
slurm_src_url: https://download.schedmd.com/slurm/slurm-21.08.8.tar.bz2
slurm_src_checksum: "sha1:7d37dbef37b25264a1593ef2057bc423e4a89e81"
slurm_version: 22.05.3
slurm_src_url: https://download.schedmd.com/slurm/slurm-22.05.3.tar.bz2
slurm_src_checksum: "sha1:55e9a1a1d2ddb67b119c2900982c908ba2846c1e"
slurm_src_dir: /opt/src/slurm-{{ slurm_version }}
slurm_dir: /opt/slurm-{{ slurm_version }}
ucx_version: 1.8.0
ucx_src_url: https://github.com/openucx/ucx/releases/download/v1.8.0/ucx-1.8.0.tar.gz
ucx_src_checksum: "sha1:96f2fe1918127edadcf5b195b6532da1da3a74fa"
ucx_src_dir: /opt/src/ucx-1.8.0
ucx_dir: /opt/ucx-1.8.0
munge_version: 0.5.14
munge_src_url: https://github.com/dun/munge/archive/refs/tags/munge-0.5.14.tar.gz
munge_src_checksum: "sha1:70f6062b696c6d4f17b1d3bdc47c3f5eca24757c"
munge_dir: /opt/munge-0.5.14
munge_src_dir: /opt/src/munge-munge-0.5.14
nvidia_mig_parted_version: 0.1.3
nvidia_mig_parted_src_url: https://github.com/NVIDIA/mig-parted/archive/refs/tags/v0.1.3.tar.gz
nvidia_mig_parted_src_checksum: "sha1:50597b4a94348c3d52b3234bb22783fa236f1d53"
nvidia_mig_parted_src_dir: /opt/src/mig-parted-0.1.3
nvidia_mig_slurm_discovery_version: master
nvidia_mig_slurm_discovery_src_url: https://gitlab.com/nvidia/hpc/slurm-mig-discovery.git
nvidia_mig_slurm_discovery_src_dir: /opt/src/mig-slurm_discovery
nvidia_cuda_version: cuda
nvidia_libcudnn_version: libcudnn8-dev
This diff is collapsed.
ansible_cluster_in_a_box
========================
**HPCasCode**
=============
The aim of this repo is to provide a set or ansible roles that can be used to deploy a cluster
We are working from
https://docs.google.com/a/monash.edu/spreadsheets/d/1IZNE7vMid_SHYxImGVtQcNUiUIrs_Nu1xqolyblr0AE/edit#gid=0
as our architecture document.
[![pipeline status](https://gitlab.erc.monash.edu.au/hpc-team/ansible_cluster_in_a_box/badges/cicd/pipeline.svg)](https://gitlab.erc.monash.edu.au/hpc-team/ansible_cluster_in_a_box/commits/cicd) [Issues](https://gitlab.erc.monash.edu.au/hpc-team/ansible_cluster_in_a_box/-/issues)
We aim to make these roles as generic as possible. You should be able to start from an inventory file, an ssh key and a git clone of this and end up with a working cluster. In the longer term we might branch to include utilities to make an inventory file using NeCTAR credentials.
1. [ Introduction / Purpose ](#Introduction)
2. [ Getting started ](#gettingstarted)
3. [ Features ](#Features)
4. [ Assumptions ](#Assumptions)
5. [ Configuration ](#Configuration)
6. [ Contribute and Collaborate ](#Contribute)
7. [ Used by ](#partners)
8. [ CICD Coverage ](#coverage)
9. [ Roadmap ](#Roadmap)
If you need a password use get_or_make_password.py (delegated to the passwword server/localhost) to generate a random one that can be shared between nodes
Here is an example task (taken from setting up karaage):
- name: mysql db
mysql_db: name=karaage login_user=root login_password={{ sqlrootPasswd.stdout }}
<a name="Introduction"></a>
## Introduction / Purpose TODO
- name: karaage sql password
shell: ~/get_or_make_passwd.py karaageSQL
delegate_to: 127.0.0.1
register: karaageSqlPassword
The purpose of this repository is to deploy a **H**igh **P**erformance **C**omputing [HPC] Systems using Infrastructure-as-Code [IaC] principles predominantly using ansible. The aim is to follow the principles of InfrastructureAsCode and repurpose the principles for HPC Systems.
- name: mysql user
mysql_user: name='karaage' password={{ item }} priv=karaage.*:ALL state=present login_user=root login_password={{ sqlrootPasswd.stdout }}
with_items: karaageSqlPassword.stdout
By encoding the system state the following advantages are gained which also define the values of this project:
- **collaboration** it is simply easier to share code than systems. This repository also aims to serve as a place for backup and discussions. And in the near future even documentation :)
- **redeployability** if a system follows the same build recipes and the deployment is automated all installations should be similar even Test and Production. You will also hear the term Immutable infrastructure for this.
<!--- Just some suggestion:
# if a system follows the same build recipes`,` and the deployment is automated, all installations should be similar `including` Test and Production. We would also use the term Immutable Infrastructure to describe this state. -->
- **CICD automation** aka Test- and change automation allow us to put safegard in place increasing our quality and also to ease the burden of change management.
<!--- - **CICD automation and change automation** it allow us to put `safeguard` in place`,` increasing our quality and `easing` the burden of change management. -->
- **modular and reusable** For example we currently support two Operation system and also Baremetal and openstack based deployment. AWS support is also in scope.
<!--- # - **modular and reusable** it allows us to support two operation such as two operating systems, and also Baremetal and openstack based deployment. AWS support is also in scope. -->
The scope if this repository ranges from an ansible inventory ( although openstack and later AWS will be supported ) and ends with a gpu powered desktop provisioning system ontop of the ressource scheduler slurm. Services for example an identity system or package repository management are in scope but future work.
<!--- The scope of this repository includes an ansible inventory with Openstack and AWS support to come, as well as GPU-desktop provisioning on top of Slurm workload manager. Services such as identify system and package repository are also in scope for future work. -->
We aim to make these roles run on all common linux platforms (both RedHat and Debian derived) but at the very least they should work on a CentOS 6 install.
<a name="gettingstarted"></a>
## Getting Started TODO
Ok all CAPS incoming. TAKE CARE OF CICD/vars/passwords.yml best case, use ansible vault at the same time you start modifying that file.
Given the current state of documentation please feel free to reach out and ask for clarification. This project or repository is far from done or being perfect!
Yaml syntax can be checked at http://www.yamllint.com/
<a name="Features"></a>
## Features
- Currently supports Centos 7, Centos 8 and Ubuntu 1804
- CICD tested including spawning a cluster, MPI and slurm tests
- Rolling node updates is currently work in progress.
- Coming up: strudel2 desktop integration
<a name="Assumptions"></a>
## Assumptions
- The ansible inventory file needs to be in the following format: nodetypes 1x[SQLNodes], 1x[NFSNodes], 2x[ManagementNodes], >=0 LoginNodes, >=0 ComputeNodes, >=0 VisNodes ( GPU Nodes ) . SQL and NFS can be combined. The ManagementNodes are managing slurm access and job submission. Here is an [example](docs/sampleinventory.yml)
- The filesystem layout assumes the following shared drives /home, /projects for project data, /scratch for fast compute, /usr/local for provided software see [here](CICD/files/etcExports)
- TODO rewrite a populated vars folder as in the CICD subfolder. See configuration for any details and apologies
- The software stack is currently provided as a mounted shared drive. Software can be loaded via Environment-Modules. Containerisation is also heavily in use. Please reach out for further details and sharing requests.
- Networking is partially defined on this level, and partially on the Baremetal level. This is for now a point to reach out and discuss. Current implementations support public facing LoginNodes and the rest being in a private Network with a NAT-Gateway. OR a private Infiniband network in conjunction with a 1G network.
- a shared ssh key for a management user
<a name="Configuration"></a>
## Configuration
Configuration is defined in the variables (vars) in the vars folder. See CICD/vars. These vars are evolving over time with the necessary refactoring and abstraction. These variables have `definitely` grown over time when this set of ansible roles was developed, first for a single system and then to multiple sytems. The best way to interpret their usage currently is to search or `grep VARIABLENAME` this repository and see how they are used. This is not pretty for external use. Lets not even pretend it is.
- filesystems.yml covers filesystem types and exports/mounts
- ldapConfig.yml covers identity via LDAP
- names.yml to store the domain name
- passwords.yml containing passwords in a single file to be encrypted via ansible-vault
- slurm.yml containing all slurm variables
- vars.yml contains variables which don't belong somewhere else
<a name="Contribute"></a>
## How do I contribute or collaborate
- Get in contact, use the issue tracker and if you want to contribute documentation or code or anything else we offer handholding for the first merge request.
- A great first start is to get in contact, tell us what you want to know and help us improve the documentation
- Please contact us via andreas.hamacher(at)monash.edu or help(at)massive.org.au
- Contribution guidelines would also be a good contribution :)
<a name="partners"></a>
## Used by:
![Monash University](docs/images/monash-university-logo.png "monash.edu")
![MASSIVE](docs/images/massive-website-banner.png "massive.org.au")
![Australian Research Data Commons](docs/images/ardc.png "ardc.edu.au")
![University of Western Australia](docs/images/university-of-western-australia-logo.png "uwa.edu.au")
<a name="Coverage"></a>
## CI Coverage
- Centos7.8, Centos8, Ubuntu1804
- All node types as outlined in Assumptions
- vars for Massive, Monarch and a Generic Cluster ( see files in CICD/vars )
- CD in progress using autoupdate.py
<a name="Roadmap"></a>
## Roadmap
- soon this section is to be moved into the Milestones on gitlab.
- Desktop integration using Strudel2. Contributors are welcome to integrate OpenOnDemand
- CVL integration see github.com/Characterisation-Virtual-Laboratory
- Automated Security Checks as part of the CICD pipeline
- Integration of a FOSS identity system currently only a token LDAP is supported
- System status monitoring and alerting
[Nice read titled Infrastructure as Code DevOps principle: meaning, benefits, use cases](https://medium.com/@FedakV/infrastructure-as-code-devops-principle-meaning-benefits-use-cases-a4461a1fef2)
---
- name: "Check client ca certificate"
register: ca_cert
stat: "path={{ x509_cacert_file }}"
- name: "Check certificate and key"
shell: (openssl x509 -noout -modulus -in {{ x509_cert_file }} | openssl md5 ; openssl rsa -noout -modulus -in {{ x509_key_file }} | openssl md5) | uniq | wc -l
register: certcheck
- name: "Check certificate"
register: cert
stat: "path={{ x509_cert_file }}"
- name: "Check key"
register: key
stat: "path={{ x509_key_file }}"
sudo: true
- name: "Default: we don't need a new certificate"
set_fact: needcert=False
- name: "Set need cert if key is missing"
set_fact: needcert=True
when: key.stat.exists == false
- name: "set needcert if cert is missing"
set_fact: needcert=True
when: cert.stat.exists == false
- name: "set needcert if cert doesn't match key"
set_fact: needcert=True
when: certcheck.stdout == '2'
- name: "Creating Keypair"
shell: "echo noop when using easy-rsa"
when: needcert
- name: "Creating CSR"
shell: " cd /etc/easy-rsa/2.0; source ./vars; export EASY_RSA=\"${EASY_RSA:-.}\"; \"$EASY_RSA\"/pkitool --csr {{ x509_csr_args }} {{ common_name }}"
when: needcert
sudo: true
- name: "Copy CSR to ansible host"
fetch: "src=/etc/easy-rsa/2.0/keys/{{ common_name }}.csr dest=/tmp/{{ common_name }}/ fail_on_missing=yes validate_md5=yes flat=yes"
sudo: true
when: needcert
- name: "Copy CSR to CA"
delegate_to: "{{ x509_ca_server }}"
copy: "src=/tmp/{{ ansible_fqdn }}/{{ common_name }}.csr dest=/etc/easy-rsa/2.0/keys/{{ common_name }}.csr force=yes"
when: needcert
sudo: true
- name: "Sign Certificate"
delegate_to: "{{ x509_ca_server }}"
shell: "source ./vars; export EASY_RSA=\"${EASY_RSA:-.}\" ;\"$EASY_RSA\"/pkitool --sign {{ common_name }}"
args:
chdir: "/etc/easy-rsa/2.0"
sudo: true
when: needcert
- name: "Copy the Certificate to ansible host"
delegate_to: "{{ x509_ca_server }}"
fetch: "src=/etc/easy-rsa/2.0/keys/{{ common_name }}.crt dest=/tmp/{{ common_name }}/ fail_on_missing=yes validate_md5=yes flat=yes"
sudo: true
when: needcert
- name: "Copy the CA Certificate to the ansible host"
delegate_to: "{{ x509_ca_server }}"
fetch: "src=/etc/easy-rsa/2.0/keys/ca.crt dest=/tmp/ca.crt fail_on_missing=yes validate_md5=yes flat=yes"
sudo: true
when: "ca_cert.stat.exists == false"
- name: "Copy the certificate to the node"
copy: "src=/tmp/{{ common_name }}/{{ common_name }}.crt dest={{ x509_cert_file }} force=yes"
sudo: true
when: needcert
- name: "Copy the CA certificate to the node"
copy: "src=/tmp/ca.crt dest={{ x509_cacert_file }}"
sudo: true
when: "ca_cert.stat.exists == false"
- name: "Copy the key to the correct location"
shell: "mkdir -p `dirname {{ x509_key_file }}` ; chmod 700 `dirname {{ x509_key_file }}` ; cp /etc/easy-rsa/2.0/keys/{{ common_name }}.key {{ x509_key_file }}"
sudo: true
when: needcert
#!/usr/bin/env python
import sys, os, string, subprocess, socket, ansible.runner, re
import copy, shlex,uuid, random, multiprocessing, time, shutil
import novaclient.v1_1.client as nvclient
import novaclient.exceptions as nvexceptions
import glanceclient.v2.client as glclient
import keystoneclient.v2_0.client as ksclient
class Authenticate:
def __init__(self, username, passwd):
self.username=username
self.passwd=passwd
self.tenantName= os.environ['OS_TENANT_NAME']
self.authUrl="https://keystone.rc.nectar.org.au:5000/v2.0"
kc = ksclient.Client( auth_url=self.authUrl,
username=self.username,
password=self.passwd)
self.tenantList=kc.tenants.list()
self.novaSemaphore = multiprocessing.BoundedSemaphore(value=1)
def createNovaObject(self,tenantName):
for tenant in self.tenantList:
if tenant.name == tenantName:
try:
nc = nvclient.Client( auth_url=self.authUrl,
username=self.username,
api_key=self.passwd,
project_id=tenant.name,
tenant_id=tenant.id,
service_type="compute"
)
return nc
except nvexceptions.ClientException:
raise
def gatherInfo(self):
for tenant in self.tenantList: print tenant.name
tenantName = raw_input("Please select a project: (Default MCC-On-R@CMON):")
if not tenantName or tenantName not in [tenant.name for tenant in self.tenantList]:
tenantName = "MCC_On_R@CMON"
print tenantName,"selected\n"
## Fetch the Nova Object
nc = self.createNovaObject(tenantName)
## Get the Flavor
flavorList = nc.flavors.list()
for flavor in flavorList: print flavor.name
flavorName = raw_input("Please select a Flavor Name: (Default m1.xxlarge):")
if not flavorName or flavorName not in [flavor.name for flavor in flavorList]:
flavorName = "m1.xxlarge"
print flavorName,"selected\n"
## Get the Availability Zones
az_p1 = subprocess.Popen(shlex.split\
("nova availability-zone-list"),stdout=subprocess.PIPE)
az_p2 = subprocess.Popen(shlex.split\
("""awk '{if ($2 && $2 != "Name")print $2}'"""),\
stdin=az_p1.stdout,stdout=subprocess.PIPE)
availabilityZonesList = subprocess.Popen(shlex.split\
("sort"),stdin=az_p2.stdout,stdout=subprocess.PIPE).communicate()[0]
print availabilityZonesList
availabilityZone = raw_input("Please select an availability zone: (Default monash-01):")
if not availabilityZone or \
availabilityZone not in [ zone for zone in availabilityZonesList.split()]:
availabilityZone = "monash-01"
print availabilityZone,"selected\n"
## Get the number of instances to spawn
numberOfInstances = raw_input\
("Please specify the number of instances to launch: (Default 1):")
if not numberOfInstances or \
not isinstance(int(numberOfInstances), int):
numberOfInstances = 1
subprocess.call(['clear'])
flavorObj = nc.flavors.find(name=flavorName)
print "Creating",numberOfInstances,\
"instance(s) in",availabilityZone,"zone..."
instanceList = []
for counter in range(0,int(numberOfInstances)):
nodeName = "MCC-Node"+str(random.randrange(1,1000))
try:
novaInstance = nc.servers.create\
(name=nodeName,image="ddc13ccd-483c-4f5d-a5fb-4b968aaf385b",\
flavor=flavorObj,key_name="shahaan",\
availability_zone=availabilityZone)
instanceList.append(novaInstance)
except nvexceptions.ClientException:
raise
continue
while 'BUILD' in [novaInstance.status \
for novaInstance in instanceList]:
for count in range(0,len(instanceList)):
time.sleep(5)
if instanceList[count].status != 'BUILD':
continue
else:
try:
instanceList[count] = nc.servers.get(instanceList[count].id)
except nvexceptions.ClientException or \
nvexceptions.ConnectionRefused or \
nvexceptions.InstanceInErrorState:
raise
del instanceList[count]
continue
activeHostsList = []
SSHports = []
for novaInstance in instanceList:
if novaInstance.status == 'ACTIVE':
hostname = socket.gethostbyaddr(novaInstance.networks.values()[0][0])[0]
activeHostsList.append(hostname)
SSHDict = {}
SSHDict['IP'] = novaInstance.networks.values()[0][0]
SSHDict['status'] = 'CLOSED'
SSHports.append(SSHDict)
print "Scanning if port 22 is open..."
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
while 'CLOSED' in [host['status'] for host in SSHports]:
for instance in range(0,len(SSHports)):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
if SSHports[instance]['status'] == 'CLOSED' and not sock.connect_ex((SSHports[instance]['IP'], 22)):
SSHports[instance]['status'] = 'OPEN'
print "Port 22, opened for IP:",SSHports[instance]['IP']
else:
time.sleep(5)
sock.close()
fr = open('/etc/ansible/hosts.rpmsave','r+')
fw = open('hosts.temp','w+')
lines = fr.readlines()
for line in lines:
fw.write(line)
if re.search('\[new-servers\]',line):
for host in activeHostsList: fw.write(host+'\n')
fr.close()
fw.close()
shutil.move('hosts.temp','/etc/ansible/hosts')
print "Building the Nodes now..."
subprocess.call(shlex.split("/mnt/nectar-nfs/root/swStack/ansible/bin/ansible-playbook /mnt/nectar-nfs/root/ansible-config-root/mcc-nectar-dev/buildNew.yml -v"))
if __name__ == "__main__":
username = os.environ['OS_USERNAME']
passwd = os.environ['OS_PASSWORD']
choice = raw_input(username + " ? (y/n):")
while choice and choice not in ("n","y"):
print "y or n please"
choice = raw_input()
if choice == "n":
username = raw_input("username :")
passwd = raw_input("password :")
auth = Authenticate(username, passwd)
auth.gatherInfo()
0,0,0,1,1,1,1,1,1,0
0,0,0,0,0,0,1,1,0,0
0,0,0,0,1,1,1,1,0,0
1,0,0,0,0,0,0,1,0,0
1,0,1,0,0,0,0,1,0,0
1,0,1,0,0,0,0,1,0,0
1,1,1,0,0,0,0,1,1,0
1,1,1,1,1,1,1,0,0,0
1,0,0,0,0,0,1,0,0,0
0,0,0,0,0,0,0,0,0,0
\ No newline at end of file
docs/ChordDiagramm/Chord_Diagramm.png

1.87 MiB

#!/usr/bin/env python3
# script copied from https://github.com/fengwangPhysics/matplotlib-chord-diagram/blob/master/README.md
# source data manually edited via https://docs.google.com/spreadsheets/d/1JN9S_A5ICPQOvgyVbWJSFJiw-5gO2vF-4AeYuWl-lbs/edit#gid=0
# chord diagram
import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import numpy as np
LW = 0.3
def polar2xy(r, theta):
return np.array([r*np.cos(theta), r*np.sin(theta)])
def hex2rgb(c):
return tuple(int(c[i:i+2], 16)/256.0 for i in (1, 3 ,5))
def IdeogramArc(start=0, end=60, radius=1.0, width=0.2, ax=None, color=(1,0,0)):
# start, end should be in [0, 360)
if start > end:
start, end = end, start
start *= np.pi/180.
end *= np.pi/180.
# optimal distance to the control points
# https://stackoverflow.com/questions/1734745/how-to-create-circle-with-b%C3%A9zier-curves
opt = 4./3. * np.tan((end-start)/ 4.) * radius
inner = radius*(1-width)
verts = [
polar2xy(radius, start),
polar2xy(radius, start) + polar2xy(opt, start+0.5*np.pi),
polar2xy(radius, end) + polar2xy(opt, end-0.5*np.pi),
polar2xy(radius, end),
polar2xy(inner, end),
polar2xy(inner, end) + polar2xy(opt*(1-width), end-0.5*np.pi),
polar2xy(inner, start) + polar2xy(opt*(1-width), start+0.5*np.pi),
polar2xy(inner, start),
polar2xy(radius, start),
]
codes = [Path.MOVETO,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.LINETO,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CLOSEPOLY,
]
if ax == None:
return verts, codes
else:
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor=color+(0.5,), edgecolor=color+(0.4,), lw=LW)
ax.add_patch(patch)
def ChordArc(start1=0, end1=60, start2=180, end2=240, radius=1.0, chordwidth=0.7, ax=None, color=(1,0,0)):
# start, end should be in [0, 360)
if start1 > end1:
start1, end1 = end1, start1
if start2 > end2:
start2, end2 = end2, start2
start1 *= np.pi/180.
end1 *= np.pi/180.
start2 *= np.pi/180.
end2 *= np.pi/180.
opt1 = 4./3. * np.tan((end1-start1)/ 4.) * radius
opt2 = 4./3. * np.tan((end2-start2)/ 4.) * radius
rchord = radius * (1-chordwidth)
verts = [
polar2xy(radius, start1),
polar2xy(radius, start1) + polar2xy(opt1, start1+0.5*np.pi),
polar2xy(radius, end1) + polar2xy(opt1, end1-0.5*np.pi),
polar2xy(radius, end1),
polar2xy(rchord, end1),
polar2xy(rchord, start2),
polar2xy(radius, start2),
polar2xy(radius, start2) + polar2xy(opt2, start2+0.5*np.pi),
polar2xy(radius, end2) + polar2xy(opt2, end2-0.5*np.pi),
polar2xy(radius, end2),
polar2xy(rchord, end2),
polar2xy(rchord, start1),
polar2xy(radius, start1),
]
codes = [Path.MOVETO,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
]
if ax == None:
return verts, codes
else:
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor=color+(0.5,), edgecolor=color+(0.4,), lw=LW)
ax.add_patch(patch)
def selfChordArc(start=0, end=60, radius=1.0, chordwidth=0.7, ax=None, color=(1,0,0)):
# start, end should be in [0, 360)
if start > end:
start, end = end, start
start *= np.pi/180.
end *= np.pi/180.
opt = 4./3. * np.tan((end-start)/ 4.) * radius
rchord = radius * (1-chordwidth)
verts = [
polar2xy(radius, start),
polar2xy(radius, start) + polar2xy(opt, start+0.5*np.pi),
polar2xy(radius, end) + polar2xy(opt, end-0.5*np.pi),
polar2xy(radius, end),
polar2xy(rchord, end),
polar2xy(rchord, start),
polar2xy(radius, start),
]
codes = [Path.MOVETO,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
]
if ax == None:
return verts, codes
else:
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor=color+(0.5,), edgecolor=color+(0.4,), lw=LW)
ax.add_patch(patch)
def chordDiagram(X, ax, colors=None, width=0.1, pad=2, chordwidth=0.7):
"""Plot a chord diagram
Parameters
----------
X :
flux data, X[i, j] is the flux from i to j
ax :
matplotlib `axes` to show the plot
colors : optional
user defined colors in rgb format. Use function hex2rgb() to convert hex color to rgb color. Default: d3.js category10
width : optional
width/thickness of the ideogram arc
pad : optional
gap pad between two neighboring ideogram arcs, unit: degree, default: 2 degree
chordwidth : optional
position of the control points for the chords, controlling the shape of the chords
"""
# X[i, j]: i -> j
x = X.sum(axis = 1) # sum over rows
ax.set_xlim(-1.1, 1.1)
ax.set_ylim(-1.1, 1.1)
if colors is None:
# use d3.js category10 https://github.com/d3/d3-3.x-api-reference/blob/master/Ordinal-Scales.md#category10
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
'#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#c49c94']
if len(x) > len(colors):
print('x is too large! Use x smaller than 11')
colors = [hex2rgb(colors[i]) for i in range(len(x))]
# find position for each start and end
y = x/np.sum(x).astype(float) * (360 - pad*len(x))
pos = {}
arc = []
nodePos = []
start = 0
for i in range(len(x)):
end = start + y[i]
arc.append((start, end))
angle = 0.5*(start+end)
#print(start, end, angle)
if -30 <= angle <= 210:
angle -= 90
else:
angle -= 270
nodePos.append(tuple(polar2xy(1.1, 0.5*(start+end)*np.pi/180.)) + (angle,))
z = (X[i, :]/x[i].astype(float)) * (end - start)
ids = np.argsort(z)
z0 = start
for j in ids:
pos[(i, j)] = (z0, z0+z[j])
z0 += z[j]
start = end + pad
for i in range(len(x)):
start, end = arc[i]
IdeogramArc(start=start, end=end, radius=1.0, ax=ax, color=colors[i], width=width)
start, end = pos[(i,i)]
selfChordArc(start, end, radius=1.-width, color=colors[i], chordwidth=chordwidth*0.7, ax=ax)
for j in range(i):
color = colors[i]
if X[i, j] > X[j, i]:
color = colors[j]
start1, end1 = pos[(i,j)]
start2, end2 = pos[(j,i)]
ChordArc(start1, end1, start2, end2,
radius=1.-width, color=colors[i], chordwidth=chordwidth, ax=ax)
#print(nodePos)
return nodePos
##################################
if __name__ == "__main__":
fig = plt.figure(figsize=(6,6))
flux = np.array([
[ 0, 1, 0, 0], #OS Sum:2 ; Centos, Ubuntu
[ 0, 0, 0, 0], #Plays
[ 0, 0, 0, 1], # Cluster: Sum5; Generic, M3, Monarch, SHPC, ACCS
[ 0, 0, 1, 2] #Cloud Sum3: AWS,Nimbus,Nectar
])
from numpy import genfromtxt
flux = genfromtxt('Chord_Diagramm - Sheet1.csv', delimiter=',')
ax = plt.axes([0,0,1,1])
#nodePos = chordDiagram(flux, ax, colors=[hex2rgb(x) for x in ['#666666', '#66ff66', '#ff6666', '#6666ff']])
nodePos = chordDiagram(flux, ax)
ax.axis('off')
prop = dict(fontsize=16*0.8, ha='center', va='center')
nodes = ['OS_Centos76','OS_Centos8','OS_Ubuntu1804','PLY_NFSSQL','PLY_MGMT','PLY_Login','PLY_Compute','C_Generic','C_M3','C_Monarch']
#nodes = ['M3_MONARCH','SHPC','Ubuntu','Centos7','Centos8','Tested','Security','Nectar','?AWS?','DGX@Baremetal','ML@M3','CVL@UWA','CVL_SW','CVL_Desktop','Strudel','/usr/local']
for i in range(len(nodes)):
ax.text(nodePos[i][0], nodePos[i][1], nodes[i], rotation=nodePos[i][2], **prop)
plt.savefig("Chord_Diagramm.png", dpi=600,transparent=False,bbox_inches='tight', pad_inches=0.02)
plt.show(pad_inches=0.02)
docs/images/ardc.png

8.87 KiB

docs/images/massive-website-banner.png

22 KiB

docs/images/monash-university-logo.png

8.27 KiB

docs/images/university-of-western-australia-logo.png

22.9 KiB

[SQLNodes]
sql1 ansible_host=192.168.0.1 ansible_user=ubuntu
[NFSNodes]
nfs11 ansible_host=192.168.0.2 ansible_user=ubuntu
[ManagementNodes]
mgmt1 ansible_host=192.168.0.3 ansible_user=ubuntu
mgmt2 ansible_host=192.168.0.4 ansible_user=ubuntu
[LoginNodes]
login1 ansible_host=192.168.0.5 ansible_user=ubuntu
[ComputeNodes]
compute1 ansible_host=192.168.0.6 ansible_user=ubuntu
\ No newline at end of file
#!/usr/bin/env python
import sys, os, string, subprocess, socket, re
import copy, shlex,uuid, random, multiprocessing, time, shutil, json
import novaclient.v1_1.client as nvclient
import novaclient.exceptions as nvexceptions
from keystoneclient.auth.identity import v2 as v2_auth
from heatclient import client as heat_client
from keystoneclient import session as kssession
class OpenStackConnection:
def __init__(self, username, passwd):
self.username=username
self.passwd=passwd
self.tenantName= os.environ['OS_TENANT_NAME']
self.tenantID= os.environ['OS_TENANT_ID']
self.authUrl="https://keystone.rc.nectar.org.au:5000/v2.0"
def _get_keystone_v2_auth(self, v2_auth_url, **kwargs):
auth_token = kwargs.pop('auth_token', None)
tenant_id = kwargs.pop('project_id', None)
tenant_name = kwargs.pop('project_name', None)
if auth_token:
return v2_auth.Token(v2_auth_url, auth_token,
tenant_id=tenant_id,
tenant_name=tenant_name)
else:
return v2_auth.Password(v2_auth_url,
username=kwargs.pop('username', None),
password=kwargs.pop('password', None),
tenant_id=tenant_id,
tenant_name=tenant_name)
def _get_keystone_session(self, **kwargs):
# first create a Keystone session
cacert = kwargs.pop('cacert', None)
cert = kwargs.pop('cert', None)
key = kwargs.pop('key', None)
insecure = kwargs.pop('insecure', False)
timeout = kwargs.pop('timeout', None)
verify = kwargs.pop('verify', None)
# FIXME(gyee): this code should come from keystoneclient
if verify is None:
if insecure:
verify = False
else:
# TODO(gyee): should we do
# heatclient.common.http.get_system_ca_fle()?
verify = cacert or True
if cert and key:
# passing cert and key together is deprecated in favour of the
# requests lib form of having the cert and key as a tuple
cert = (cert, key)
return kssession.Session(verify=verify, cert=cert, timeout=timeout)
def _get_keystone_auth(self, session, auth_url, **kwargs):
# FIXME(dhu): this code should come from keystoneclient
# discover the supported keystone versions using the given url
v2_auth_url=auth_url
v3_auth_url=None
# Determine which authentication plugin to use. First inspect the
# auth_url to see the supported version. If both v3 and v2 are
# supported, then use the highest version if possible.
auth = None
if v3_auth_url and v2_auth_url:
user_domain_name = kwargs.get('user_domain_name', None)
user_domain_id = kwargs.get('user_domain_id', None)
project_domain_name = kwargs.get('project_domain_name', None)
project_domain_id = kwargs.get('project_domain_id', None)
# support both v2 and v3 auth. Use v3 if domain information is
# provided.
if (user_domain_name or user_domain_id or project_domain_name or
project_domain_id):
auth = self._get_keystone_v3_auth(v3_auth_url, **kwargs)
else:
auth = self._get_keystone_v2_auth(v2_auth_url, **kwargs)
elif v3_auth_url:
# support only v3
auth = self._get_keystone_v3_auth(v3_auth_url, **kwargs)
elif v2_auth_url:
# support only v2
auth = self._get_keystone_v2_auth(v2_auth_url, **kwargs)
else:
raise exc.CommandError(_('Unable to determine the Keystone '
'version to authenticate with using the '
'given auth_url.'))
return auth
def get_stack_name(self,stack):
stacks=[]
for s in self.hc.stacks.list():
stacks.append(s.stack_name)
if stack in stacks:
return stack
elif len(stacks)==1:
return stacks[0]
elif len(stacks)==0:
raise Exception("You do not have any heat stacks in your OpenStack Project")
else:
raise Exception("You have multiple heat stacks in your OpenStack Project and I'm not sure which one to use.\n You can select a stack by symlinking to a stack, for example if you have a stack called mycluster do ln -s %s mycluster\n"%stack)
def auth(self):
self.nc = nvclient.Client( auth_url=self.authUrl,
username=self.username,
api_key=self.passwd,
project_id=self.tenantName,
tenant_id=self.tenantID,
service_type="compute"
)
kwargs = {
'insecure': False,
}
keystone_session = self._get_keystone_session(**kwargs)
kwargs = {
'username': self.username,
'password': self.passwd,
'project_id': self.tenantID,
'project_name': self.tenantName
}
keystone_auth = self._get_keystone_auth(keystone_session,
self.authUrl,
**kwargs)
endpoint = keystone_auth.get_endpoint(keystone_session,service_type='orchestration', region_name=None)
kwargs = {
'username': self.username,
'include_pass': False,
'session': keystone_session,
'auth_url': self.authUrl,
'region_name': '',
'endpoint_type': 'publicURL',
'service_type': 'orchestration',
'password': self.passwd,
'auth': keystone_auth,
}
api_version=1
self.hc = heat_client.Client(api_version, endpoint, **kwargs)
def recurse_resources(self,stack,resource):
result=[]
if 'OS::Nova::Server' in resource.resource_type:
result.append(resource.physical_resource_id)
if 'OS::Heat::ResourceGroup' in resource.resource_type:
for r in self.hc.resources.list(resource.physical_resource_id):
result.extend(self.recurse_resources(stack,r))
return result
def gatherInfo(self,stack_name):
## Fetch the Nova Object
instance_ids=[]
for i in self.hc.stacks.list():
if i.stack_name == stack_name:
for r in self.hc.resources.list(i.stack_name):
instance_ids.extend(self.recurse_resources(stack=i,resource=r))
nc=self.nc
inventory = {}
inventory['_meta'] = { 'hostvars': {} }
for server in nc.servers.list():
if server.id in instance_ids:
if server.metadata and 'ansible_host_groups' in server.metadata:
hostname=server.name
groups = json.loads(server.metadata['ansible_host_groups'])
for group in groups:
if group in inventory:
inventory[group].append(hostname)
else:
inventory[group] = [ hostname ]
elif server.metadata and 'ansible_host_group' in server.metadata:
#hostname = socket.gethostbyaddr(server.networks.values()[0][0])[0]
hostname = server.name
# Set Ansible Host Group
if server.metadata['ansible_host_group'] in inventory:
inventory[server.metadata['ansible_host_group']].append(hostname)
else:
inventory[server.metadata['ansible_host_group']] = [hostname]
# Set the other host variables
inventory['_meta']['hostvars'][hostname] = {}
inventory['_meta']['hostvars'][hostname]['ansible_ssh_host'] = server.networks.values()[0][0]
inventory['_meta']['hostvars'][hostname]['ansible_remote_tmp'] = '/tmp/ansible'
for key in server.metadata.keys():
if 'ansible_ssh' in key:
inventory['_meta']['hostvars'][hostname][key] = server.metadata[key]
inventory['_meta']['hostvars'][hostname]['ansible_ssh_user'] = 'ec2-user'
print json.dumps(inventory)
if __name__ == "__main__":
stack_name=os.path.basename(sys.argv[0])
username = os.environ['OS_USERNAME']
passwd = os.environ['OS_PASSWORD']
openstack = OpenStackConnection(username, passwd)
openstack.auth()
stack_name=openstack.get_stack_name(stack_name)
openstack.gatherInfo(stack_name)
---
description: " A simple template to boot a 3 node cluster"
heat_template_version: 2013-05-23
parameters:
image_id:
type: string
label: Image ID
description: Image to be used for compute instance
default: a5e74703-f343-415a-aa23-bd0f0aacfc9e
key_name:
type: string
label: Key Name
description: Name of key-pair to be used for compute instance
default: shahaan
availability_z:
type: string
label: Availability Zone
description: Availability Zone to be used for launching compute instance
default: monash-01
resources:
headNode:
type: "OS::Nova::Server"
properties:
availability_zone: { get_param: availability_z }
flavor: m1.small
image: { get_param: image_id }
key_name: { get_param: key_name }
metadata:
ansible_host_group: headNode
ansible_ssh_user: ec2-user
ansible_ssh_private_key_file: /home/sgeadmin/.ssh/shahaan.pem
headVolume:
type: OS::Cinder::Volume
properties:
availability_zone: { get_param: availability_z }
description: Volume that will attach the headNode
name: headNodeVolume
size: 50
volumeAttachment:
type: OS::Cinder::VolumeAttachment
properties:
instance_uuid: { get_resource: headNode }
volume_id: { get_resource: headVolume }
---
-
hosts: openvpn-servers
remote_user: ec2-user
roles:
#- OpenVPN-Server
- nfs-server
sudo: true
vars:
x509_ca_server: vm-118-138-240-224.erc.monash.edu.au
-
hosts: openvpn-clients
remote_user: ec2-user
roles:
#- easy-rsa-common
#- easy-rsa-certificate
#- OpenVPN-Client
- syncExports
- nfs-client
sudo: true
vars:
x509_ca_server: vm-118-138-240-224.erc.monash.edu.au
openvpn_servers: ['vm-118-138-240-224.erc.monash.edu.au']
nfs_server: "vm-118-138-240-224.erc.monash.edu.au"
- hosts: 'ComputeNodes,DGXRHELNodes'
gather_facts: false
tasks:
- include_vars: vars/ldapConfig.yml
- include_vars: vars/filesystems.yml
- include_vars: vars/slurm.yml
- include_vars: vars/vars.yml
- { name: set use shared state, set_fact: usesharedstatedir=False }
tags: [ never ]
# these are just templates. Not the tag never! Everything with never is only executed if called explicitly aka ansible-playbook --tags=foo,bar OR -tags=tag_group
- hosts: 'ComputeNodes,DGXRHELNodes'
gather_facts: false
tasks:
- { name: template_shell, shell: ls, tags: [never,tag_group,uniquetag_foo] }
- { name: template_command, command: uname chdir=/bin, tags: [never,tag_group,uniquetag_bar] }
- hosts: 'ComputeNodes,LoginNodes,DGXRHELNodes'
gather_facts: false
tasks:
- { name: kill user bash shells, shell: 'ps aux | grep -i -e bash -e vscode-server -e zsh -e tmux -e sftp-server -e trungn | grep -v -e "ec2-user" -e ubuntu -e philipc -e smichnow | grep -v "root" | sed "s/\ \ */\ /g" | cut -f 2 -d " " | xargs -I{} kill -09 {}', become: true, become_user: root, tags: [never,kickshells]}
- { name: Disable MonARCH Lustre Cron Check, cron: name="Check dmesg for lustre errors" state=absent,become_user: root,become: True ,tags: [never, monarch_disable] }
- name: Re-enable MonARCH Lustre Cron Check
cron: name="Check dmesg for lustre errors" minute="*/5" job="/usr/local/sbin/check_lustre_dmesg.sh >> /tmp/check_lustre_output.txt 2>&1"
become: true
become_user: root
tags: [never, monarch_enable ]
- hosts: 'ManagementNodes'
gather_facts: false
tasks:
- name: prep a mgmt node for shutdown DO NOT FORGET TO LIMIT gluster needs 2 out of 3 to run
block:
# the failover actually works. but it only takes down the primary. so if this would be called from the backup all of slurm would go down
#- { name: force a failover shell: /opt/slurm-19.05.4/bin/scontrol takeover }
- { name: stop slurmdbd service, service: name=slurmdbd state=stopped }
- { name: stop slurmctld service, service: name=slurmctld state=stopped }
- { name: stop glusterd service, service: name=glusterd state=stopped }
- { name: stop glusterfsd service, service: name=glusterfsd state=stopped }
become: true
tags: [never,prepmgmtshutdown]
- name: verify a mgmt node came up well
block:
# TODO verify vdb is mounted
- { name: start glusterd service, service: name=glusterd state=started }
- { name: start glusterfsd service, service: name=glusterfsd state=started }
- { name: start slurmctld service, service: name=slurmctld state=started }
- { name: start slurmdbd service, service: name=slurmdbd state=started }
become: true
tags: [never,verifymgmtNode]
- hosts: 'SQLNodes'
gather_facts: false
tasks:
- name: prep a sqlnode node for shutdown
block:
- { name: stop mariadb service, service: name=mariadb state=stopped }
- { name: stop glusterd service, service: name=glusterd state=stopped }
- { name: stop glusterfsd service, service: name=glusterfsd state=stopped }
become: true
tags: [never,prepsqlshutdown]
- name: verify an sql node after a restart
block:
- { name: ensure mariadb service runs, service: name=mariadb state=started }
- { name: ensure glusterd service runs, service: name=glusterd state=started }
- { name: ensure glusterfsd service runs, service: name=glusterfsd state=started }
become: true
tags: [never,sqlverify]
- hosts: 'LoginNodes:!perfsonar01'
gather_facts: false
tasks:
- name: set nologin
block:
- include_vars: vars/slurm.yml
- { name: populate nologin file, shell: 'echo "{{ clustername }} is down for a scheduled maintenance." > /etc/nologin', become: true, become_user: root }
- { name: set attribute immutable so will not be deleted, shell: 'chattr +i /etc/nologin', become: true, become_user: root }
become: true
tags: [never,setnologin]
- name: remove nologin
block:
- { name: unset attribute immutable to allow deletion, shell: 'chattr -i /etc/nologin', become: true, become_user: root }
- { name: remove nologin file, file: path=/etc/nologin state=absent, become: true, become_user: root }
become: true
tags: [never,removenologin]
- name: terminate user ssh processes
block:
- { name: kill shells, shell: 'ps aux | grep -i bash | grep -v "ec2-user" | grep -v "root" | sed "s/\ \ */\ /g" | cut -f 2 -d " " | xargs -I{} kill -09 {}', become: true, become_user: root }
- { name: kill rsync sftp scp, shell: 'ps aux | egrep "sleep|sh|rsync|sftp|scp|sftp-server|sshd" | grep -v "ec2-user" | grep -v "root" | sed "s/\ \ */\ /g" | cut -f 2 -d " " | xargs -I{} kill -09 {}', become: true, become_user: root }
- { name: kill vscode, shell: 'pgrep -f vscode | xargs -I{} kill -09 {}', become: true, become_user: root, ignore_errors: true }
become: true
tags: [never,terminateusersshscprsync]
- hosts: 'LoginNodes,ComputeNodes,DGXRHELNodes,GlobusNodes'
gather_facts: false
tasks:
- name: stop lustre and disable service
block:
- { name: stop and disable lustre service, service: name=lustre-client enabled=False state=stopped }
become: true
tags: [never,stopdisablelustre]
- name: start lustre and enable service
block:
- { name: start and enable lustre service, service: name=lustre-client enabled=True state=started }
become: true
tags: [never,startenablelustre16Aug]
- hosts: 'ComputeNodes,LoginNodes,DGXRHELNodes'
gather_facts: false
tasks:
- { name: disable_lustre_service, service: name=lustre-client enabled=no, tags: [never,disable_lustre_service] }
- hosts: 'ComputeNodes,LoginNodes,DGXRHELNodes,ManagementNodes'
gather_facts: false
tasks:
- { name: umount /home, mount: path=/home state=unmounted, become: true, become_user: root, tags: [never,umount_home] }
#this should not really end up in the main branch but it does not hurt if it will
- hosts: 'ComputeNodes,LoginNodes,DGXRHELNodes,ManagementNodes'
gather_facts: false
tasks:
- { name: umount local-legacy, mount: path=/usr/local-legacy state=absent, become: true, become_user: root, tags: [never,umount_locallegacy] }
#!/bin/sh
#
#mount | grep gvfs | while read -r line ;
#do
# read -ra line_array <<< $line
# echo "umount ${line_array[2]}"
#done
#un-stuck yum
#mv /var/lib/rpm/__db* /tmp/
#mv /var/lib/rpm/.rpm.lock /tmp/
#mv /var/lib/rpm/.dbenv.lock /tmp
#yum clean all
---
- hosts: all
vars_files:
- massive_var/main.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: batch, group: ComputeNodes, default: true}
roles:
- { role: etcHosts, domain: "{{ ldapDomain }}" }
- hosts: 'ManagementNodes'
vars_files:
- massive_var/main.yml
- massive_var/package.yml
- massive_var/passwords.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: batch, group: ComputeNodes, default: true}
- {name: dev, group: ComputeNodesDev, default: false}
- {name: multicore, group: ComputeNodesLarge, default: false}
mkFileSystems:
- {fstype : 'ext4', dev: '/dev/vdc1', opts: ''}
- {fstype : 'ext4', dev: '/dev/vdc2', opts: ''}
- {fstype : 'ext4', dev: '/dev/vdc3', opts: ''}
mountFileSystems:
- {fstype : 'ext4', dev: '/dev/vdc1', opts: 'defaults,nofail', name: '/cvl/scratch'}
- {fstype : 'ext4', dev: '/dev/vdc2', opts: 'defaults,nofail', name: '/cvl/home'}
- {fstype : 'ext4', dev: '/dev/vdc3', opts: 'defaults,nofail', name: '/cvl/local'}
roles:
- { role: easy-rsa-CA }
- { role: OpenVPN-Server }
- { role: ntp }
- { role: openLdapClient }
- { role: slurm-build }
- { role: nfs-server, configDiskDevice: true }
- { role: slurm, slurm_use_vpn: true}
- { role: installPackage, yumGroupPackageList: ['CVL Pre-installation', 'CVL Base Packages'], cliCopy: {'run': 'cp -r /usr/local/Modules/modulefiles/cvl /usr/local/Modules/modulefiles/massive', 'check': '/usr/local/Modules/modulefiles/massive'} }
- hosts: all
vars_files:
- massive_var/main.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
roles:
- { role: etcHosts, domain: "{{ ldapDomain }}" }
- hosts: 'ComputeNodes*'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
roles:
- { role: OpenVPN-Client }
- hosts: 'LoginNodes'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
roles:
- { role: OpenVPN-Client }
- hosts: all
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
nfs_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
groupList:
- { name : 'ComputeNodes', interface : 'tun0' }
- { name : 'ComputeNodesDev', interface : 'tun0' }
- { name : 'ComputeNodesLarge', interface : 'tun0' }
- { name : 'LoginNodes', interface : 'tun0' }
exportList:
- { name: '/usr/local', src: '/cvl/local', fstype: 'nfs4', opts: 'defaults,ro,nofail', interface : 'tun0', srvopts: 'ro,sync' }
- { name: '/home', src: '/cvl/home', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
- { name: '/scratch', src: '/cvl/scratch', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
roles:
- { role: etcHosts, domain: "{{ ldapDomain }}" }
- { role: syncExports }
- hosts: 'ComputeNodes'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: batch, group: ComputeNodes, default: true}
nfs_server: "{{ groups['ManagementNodes'][0] }}"
groupList:
- { name : 'ComputeNodes', interface : 'tun0' }
exportList:
- { name: '/usr/local', src: '/cvl/local', fstype: 'nfs4', opts: 'defaults,ro,nofail', interface : 'tun0', srvopts: 'ro,sync' }
- { name: '/home', src: '/cvl/home', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
- { name: '/scratch', src: '/cvl/scratch', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
roles:
- { role: ntp }
- { role: openLdapClient }
- { role: nfs-client }
- { role: slurm, slurm_use_vpn: true}
- { role: installPackage, preInstallation: "umount /usr/local", postInstallation: "mount /usr/local", yumGroupPackageList: ["CVL Pre-installation", "CVL Base Packages"], cliFileCopy: {'src': '/tmp/gconf_path', 'dest': '/etc/gconf/2/path'} }
- hosts: 'ComputeNodesDev'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: dev, group: ComputeNodesDev, default: false}
nfs_server: "{{ groups['ManagementNodes'][0] }}"
groupList:
- { name : 'ComputeNodes', interface : 'tun0' }
exportList:
- { name: '/home', src: '/cvl/home', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
- { name: '/scratch', src: '/cvl/scratch', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
roles:
- { role: ntp }
- { role: openLdapClient }
- { role: nfs-client }
- { role: slurm, slurm_use_vpn: true}
- { role: installPackage, preInstallation: "umount /usr/local", postInstallation: "mount /usr/local", yumGroupPackageList: ["CVL Pre-installation", "CVL Base Packages"], cliFileCopy: {'src': '/tmp/gconf_path', 'dest': '/etc/gconf/2/path'} }
- hosts: 'ComputeNodesLarge'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
- massive_var/package.yml
vars:
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: multicore, group: ComputeNodesLarge, default: false}
nfs_server: "{{ groups['ManagementNodes'][0] }}"
groupList:
- { name : 'ComputeNodes', interface : 'tun0' }
exportList:
- { name: '/usr/local', src: '/cvl/local', fstype: 'nfs4', opts: 'defaults,ro,nofail', interface : 'tun0', srvopts: 'ro,sync' }
- { name: '/home', src: '/cvl/home', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
- { name: '/scratch', src: '/cvl/scratch', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
roles:
- { role: ntp }
- { role: openLdapClient }
- { role: nfs-client }
- { role: slurm, slurm_use_vpn: true}
- { role: installPackage, preInstallation: "umount /usr/local", postInstallation: "mount /usr/local", yumGroupPackageList: ["CVL Pre-installation", "CVL Base Packages"], cliFileCopy: {'src': '/tmp/gconf_path', 'dest': '/etc/gconf/2/path'} }
- hosts: 'LoginNodes'
vars_files:
- massive_var/main.yml
- massive_var/passwords.yml
vars:
groupList:
- { name : 'ComputeNodes', interface : 'tun0' }
x509_ca_server: "{{ groups['ManagementNodes'][0] }}"
openvpn_servers: "{{ groups['ManagementNodes'] }}"
slurmctrl: "{{ groups['ManagementNodes'][0] }}"
slurmqueues:
- {name: batch, group: ComputeNodes, default: true}
exportList:
- { name: '/home', src: '/cvl/home', fstype: 'nfs4', opts: 'defaults,nofail', interface : 'tun0', srvopts: 'rw,root_squash,sync' }
roles:
- { role: ntp }
- { role: openLdapClient }
- { role: nfs-client }
- { role: slurm, slurm_use_vpn: true}
- { role: installPackage, importRepo: { command: "wget http://cvlrepo.massive.org.au/repo/cvl.repo -O", destination: "/etc/yum.repos.d/cvl.repo" }, yumGroupPackageList: ['CVL Pre-installation', 'CVL Base Packages'], cliCopy: {'run': 'cp -r /usr/local/Modules/modulefiles/cvl /usr/local/Modules/modulefiles/massive', 'check': '/usr/local/Modules/modulefiles/massive'} }
---
ldapServerHostIpLine: "130.220.209.234 m2-w.massive.org.au"
ldapCaCertSrc: "/tmp/m1-w-ca.pem"
countryName: "AU"
reginalName: "Victoria"
cityName: "Melbourne"
organizationName: "Monash University"
emailAddress: "help@massive.org.au"
organizationUnit: "MASSIVE"
nfsServerIpAddress: m2-login3.massive.org.au
x509_cert_file: "/etc/openvpn/certs/{{ x509_ca_server }}.crt"
x509_key_file: "/etc/openvpn/private/{{ x509_ca_server }}.key"
x509_cacert_file: "/etc/ssl/certs/ca_{{ x509_ca_server }}.crt"
###x509_common_name: "{{ x509_ca_server }}CommonName"
x509_common_name: "{{ inventory_hostname }}"
x509_csr_args: "--server"
x509_sign_args: "{{ x509_csr_args }}"
dhparms_file: "/etc/openvpn/private/dh.pem"
server_network: "10.8.0.0"
server_netmask: "255.255.255.0"
slurm_version: 14.11.2
munge_version: 0.5.11
userRelocationName: "ec2-user"
userNewHome: "/local_home"
#nfs_type: "nfs4"
#nfs_options: "defaults"
#nfs_server: "m2-login3.massive.org.au"
ldapServerHost: "130.220.209.234 m2-w.massive.org.au"
ldapDomain: "massive.org.au"
ldapURI: "ldaps://m2-w.massive.org.au:1637/"
ldapBindDN: "cn=ldapbind,cn=users,dc=massive,dc=org,dc=au"
ldapBase: "cn=users,dc=massive,dc=org,dc=au"
ldapUserClass: "user"
ldapUserHomeDirectory: "unixHomeDirectory"
ldapUserPricipal: "userPrincipalName"
ldapGroupBase: "ou=groups,dc=massive,dc=org,dc=au"
tlsCaCertDirectory: "/etc/openldap/certs"
ldapCaCertFile: "/etc/openldap/certs/m1-w-ca.pem"
ldapCaCertFileSource: "/tmp/cvl2server/m1-w-ca.pem"
cacertFile: "cacert.pem"
#domain: "cvl.massive.org.au"
domain: "massive.org.au"
ldapRfc2307: |
ldap_schema = rfc2307
ldap_search_base = cn=users,dc=massive,dc=org,dc=au
ldap_user_search_base = cn=users,dc=massive,dc=org,dc=au
ldap_user_object_class = user
ldap_user_home_directory = unixHomeDirectory
ldap_user_principal = userPrincipalName
ldap_user_name = uid
ldap_group_search_base = ou=groups,dc=massive,dc=org,dc=au
ldap_group_object_class = group
ldap_access_order = expire
ldap_account_expire_policy = ad
ldapRfc2307Pam: |
scope sub
nss_base_passwd cn=users,dc=massive,dc=org,dc=au?sub
nss_base_shadow cn=users,dc=massive,dc=org,dc=au?sub
nss_base_group cn=users,dc=massive,dc=org,dc=au?sub
nss_map_objectclass posixAccount user
nss_map_objectclass shadowAccount user
nss_map_objectclass posixGroup group
nss_map_attribute homeDirectory unixHomeDirectory
nss_map_attribute uniqueMember member
nss_map_attribute shadowLastChange pwdLastSet
pam_login_attribute sAMAccountName
pam_filter objectClass=User
pam_password ad
---
importRepo: { command: "wget http://cvlrepo.massive.org.au/repo/cvl.repo -O", destination: "/etc/yum.repos.d/cvl.repo" }
#yumGroupPackageList:
# - CVL Pre-installation
# - CVL Base Packages
# - CVL System
# - CVL System Extension
# - CVL General Imaging Tools