Working on DTU HPC Clusters

If you are working on Windows machines consider using MobaXterm to connect to the clusters otherwise you can also install the Linux Subsystem for Windows (WSL) on your machine and connect from there. To interact with the files on the cluster and your local machine you could use WinSCP or you directly mount the cluster on your system using sshfs. This is pretty convenient as then the server simply appears in the list of network drives. Have a look at https://docs.hpc.ait.dtu.dk/#sshfs how to get sshfs and connect (or google). You need to install WinFsp and then SSHFS-Win.

Sophia (DTU Wind)

Running scripts

Sophia runs with the Slurm Workload Manager. Never run any script on the login node (the one you first connect to when you connect to it ie. ssh -X <user>@sophia.dtu.dk). Instead launch a serial job:

srun python your_script.py

This sends off your serial job to the cluster and automatically copies your environment to the run node.

Alternatively you can start an interactive node by:

salloc -n 1 --partition workq --time 10:00:00

You can change the partition to windq as well and if you just want to work for an hour or so you can also set --time 01:00:00. To make life a little easier there are some quick functions allowing you to launch interactive jobs placed in pye2dpolar/tools/. Here qia is the shortcut for the code-snippet above, whereas qi launches an interactive job using srun (then you need to use srun python your_script.py even when on the node). Those shortcuts also take command line inputs (just checkout the scripts to see the options).

Connect to the node (you can also use the node name sn<node_id>):

ssh -X $SLURM_NODELIST

Once you are on the node do not forget to activate the pye2dpolar environment, move to the run folder and you are ready to run directly with

python your_script.py

If you want to check how much activity there is on the cluster and in the queue, you can create the following aliases in your .bashrc for convenience:

alias free='sinfo -o \"%20P%8D%16A%8c%N\"'
alias sq='squeue -u <user> --format "%.18i %.9P %.15j %.8u %.8T %.10M %.9l %.6D %R"'
alias swind='squeue -l -p windq'
alias swork='squeue -l -p workq'

This will give you an overview of how many free computational resources there are, so you can choose the partition accordingly.

Development with Jupyter

A nice way to interactively run CFD polars and analyse them is through Jupyter notebooks. To do this on sophia you essentially need to start a batch job that runs a jupyter kernel to which you connect from your local machine. This method is essentially thanks to Kenneth Lønbæk.

So from your local machine (on Windows you can use the local terminal from MobaXterm) you connect to the login node and are forwarding port 8890 (you can change the port number):

ssh -L 8890:localhost:8890 <user>@sophia.dtu.dk

Once on the login node you launch the runjupy.sh slurm job anywhere you like; an alias has automatically been created in your .bashrc during installation, just type runjupy. Here the runbatch file:

#!/bin/sh
port="8890"
time="10"
queue="workq"
while :; do
    case $1 in
        -p|--port) port=$2
        shift
        ;;
        -t|--time) time=$2
        shift
        ;;
        -q|--queue) queue=$2
        shift
        ;;
        *) break
    esac
    shift
done
echo launching jupyter kernel....
echo port: $port "|" time: $time "|" queue: $queue
cat <<EOF > srunjupy.sh
#!/bin/sh
#SBATCH --job-name=jupy
#SBATCH --workdir=.
#SBATCH --output=jupy.o%j
##SBATCH --error=jupy.e%j
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=$queue
#SBATCH --distribution=plane=1
#SBATCH --time=$time:00:00
#SBATCH --exclusive
ulimit -s unlimited

echo running jupyter kernel....
echo port: $port "|" time: $time "|" queue: $queue

source $PYE2DPOLAR_PATH/build/activate_pye2dpolar.sh
# start notebook
echo \$SLURM_JOB_NODELIST > jupyinfo
echo \$SLURM_SUBMIT_DIR >> jupyinfo
jupyter notebook --no-browser --port=$port
wait
EOF
sbatch srunjupy.sh
# wait until job runs and file is written
echo "waiting for kernel to start..."
until [ -f jupyinfo ]
do
     sleep 1
done
# SLURM_JOB_NODELIST=$(sed -n '1p' jupyinfo | xargs) xargs removes whitespace
# SLURM_SUBMIT_DIR=$(sed -n '2p' jupyinfo | xargs)
SLURM_JOB_NODELIST=$(sed -n '1p' jupyinfo)
SLURM_SUBMIT_DIR=$(sed -n '2p' jupyinfo)
rm jupyinfo
echo "ssh -t -L $port:localhost:$port $SLURM_JOB_NODELIST 'cd $SLURM_SUBMIT_DIR; bash --login'"
ssh -t -L $port:localhost:$port $SLURM_JOB_NODELIST 'cd '$SLURM_SUBMIT_DIR'; bash --login'
rm srunjupy.sh

There are some defaults for the port, queue and time you wish to run the kernel for, but you can change those through command line inputs ie runjupy -t 5 -q workq -p 8891. Once launched (you might need to wait for a node to free up) .. This script also automatically creates a ssh connection script in your submit directory. Just launch it

it directly connects you to the node that is running your notebook and forwards the port you specified. Now you should be able to connect to the notebook through your web browser, just load your local port by typing the following in the address field:

http://localhost:8890/

The startup of the notebook might take a little, so be patient whilst it throwing some error messages in your terminal, they will disappear once the notebook is running. If you need the access token have a look in the slurm run output jupy.o<run_id> inside the launch folder, there you can also check whether the kernel is running as expected.

Gbar (DTU Central)

Information about the Gbar cluster can be found here: https://www.hpc.dtu.dk/?page_id=2534 including information on troubleshooting etc, (ssh -X <user>@login1.gbar.dtu.dk). As for Sophia, never run jobs on the login nodes! Also read the above text for Sophia as there is a lot of overlap between the two.

One can request an interactive node, where you can do all your work, with:

linuxsh

This will also ensure that you will move directly to the interactive node.

When submitting jobs here you need to set the queue=hpc (if not the code will do it for you). Here you can run your scripts with simply:

python your_script.py

If you would like to see the progress of your jobs or the available computer resources refer to the detailed documentation here: https://www.hpc.dtu.dk/?page_id=1519.