Skip to content
Snippets Groups Projects
Commit 240e206e authored by Chris Hines's avatar Chris Hines
Browse files

begining documentation

parent c0ebc779
No related branches found
No related tags found
3 merge requests!106if stat fails, display the error instead of immediately refreshing...,!99Dev,!30Dev
Overall Flow
------------
1. Frontend static files (html and .js) are fetched. This could be from a CDN but right now its on the same instance as the backend, with a different domain name
2. The Frontend generates and ssh key and gets a cert from SSHAuthZ. This is an OAuth2 Implicit flow (original sshauthz by Jason doesn't support this so I wrote a new one). Access to the cert is mediated by OpenID Connect through either AAF or Google (if Apple gets their act together I'll be happy to suport login with Apple)
3. Having got a certificate the frontend will be able to login to a number of clusters (remember MonARCH CVL@UWA and M3 share the same certificates). Config files will tell the frontend about the login nodes. The frontend (optionally) uses the key/cert to get user info (like quotas) and check cachet.
At the point the user will select an application (eg a desktop or a jupyter notebook) and the frontend will contact pull in an iframe loaded with the form for the particular login node. This form is reponsible for providing the sbatch command or the pbs command (each cluster therefor has a different form)
Having got the sbatch command the frontend can submit a job to the cluster
At this point all the communication is through the backend component TES (tunnel and execution service). TES understands simple GETs and PUTs as well as ssh tunnels.
4. Once the job is running, the user can click "Connect" This will open a new tab to to the url "strudel2-api.cloud.cvl.org.au". This URL path connects to the TWS (Transparent WebSockets Proxy) component. This is cohosted in the same docker container as the TES. Unlike the TES which is written at the HTTP application layer, the TWS is written at the unix socket layer. The TWS will search incomming bytes for a magic cookie then connect these incomming bytes to the correct SSH tunnel.
Components
----------
Each component containts its own documentation directory for installation instructions.
This is the Strudel2 Frontend (an SPA written in angular).
https://gitlab.erc.monash.edu.au/hpc-team/pysshauthz is the SSHAuthZ component needed go map web based login (i.e. OpenID Connect) to SSH certificates for access to the cluster
https://gitlab.erc.monash.edu.au/hpc-team/batchbuilder is the code that provides a form for each application. This version is used by Monash and is specific to slurm.
Configuration
-------------
There are a number of configuration files for Strudel2. All configs and be subplemented by loading a local json file into the frontend.
1. The list of backends. Usually code served from strudel2-test connects to strudel2-api-test but this is not necessarily the case. The list of backends allows the user to choose the closest backend (in terms of network latency) or fail over from one backend to another in the even of an outage.
3. The list of SSH AuthZ servers. Each compute site is expected to operate the own sshauthz server, since if you control the CA you can impersonate any user at the site.
4. The list of Compute resources (i.e. clusters/sites). Each site
- a login node
- a CA which can issue certs for this login node
- website capable fo generating the correct sbatch/qsub/bash command to execute programs
- a command to list all jobs
- a command to cancel a job
- a list of apps
5. The list of apps. Each application defines:
- a command to start the application
- a number of actions to be preformed on the application (eg connecting to the application). Each action defines
- A command to return a json blob
- A URL that when filled in with data from the json blob will connect the user.
Application lists have some extra parameters that are unused. For example applist allows you to define applications recursively (if you want to have desktops served via guacamole or novnc or xrdp). The can also define a URL for configuring the application (As opposed to the URL for configuring the cluster resources). That URL might allow you to configure input files for example.
Some possible actions include: Connecting to the App, Viewing the output logs, viewing the sacct reporting for the app.
Cluster Scripts
---------------
Depending on what applications a cluster support they will need some simple scripts to start the applications and generate json output (like what port is the application running on and what is its API token/password. For M3 these are installed in /usr/local/sv2 and /usr/local/sv2/dev (for development scripts). I was hoping that these would be independet of the cluster/job scheduler and could be installed into a container along with the application itself, but this is turning out not to be the case. For example, a user might run more than one jupyter notebook on a node, in which case we need a way to determin which notebook they were trying to connect to. The only way to do this turns out to be to use sstat to figure out which processes belong to which job.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment