Working on DTU HPC Clusters
If you are working on Windows machines consider using MobaXterm to connect to the clusters
otherwise you can also install the Linux Subsystem for Windows (WSL) on your machine and connect from there.
To interact with the files on the cluster and your local machine you could use WinSCP or you directly mount
the cluster on your system using sshfs
. This is pretty convenient as then the server simply appears in the list of
network drives. Have a look at https://docs.hpc.ait.dtu.dk/#sshfs how to get sshfs
and connect (or google). You
need to install WinFsp and then SSHFS-Win.
Sophia (DTU Wind)
Running scripts
Sophia runs with the Slurm Workload Manager. Never run any script on the login node (the one you
first connect to when you connect to it ie. ssh -X <user>@sophia.dtu.dk
). Instead launch a serial
job:
srun python your_script.py
This sends off your serial job to the cluster and automatically copies your environment to the run node.
Alternatively you can start an interactive node by:
salloc -n 1 --partition workq --time 10:00:00
You can change the partition to windq
as well and if you just want to work for an hour or so you
can also set --time 01:00:00
. To make life a little easier there are some quick functions
allowing you to launch interactive jobs placed in pye2dpolar/tools/. Here qia
is the shortcut
for the code-snippet above, whereas qi
launches an interactive job using srun
(then you need to use srun python your_script.py
even when on the node). Those shortcuts also take
command line inputs (just checkout the scripts to see the options).
Connect to the node (you can also use the node name sn<node_id>
):
ssh -X $SLURM_NODELIST
Once you are on the node do not forget to activate the pye2dpolar environment, move to the run folder and you are ready to run directly with
python your_script.py
If you want to check how much activity there is on the cluster and in the queue, you can create the following aliases in your .bashrc for convenience:
alias free='sinfo -o \"%20P%8D%16A%8c%N\"'
alias sq='squeue -u <user> --format "%.18i %.9P %.15j %.8u %.8T %.10M %.9l %.6D %R"'
alias swind='squeue -l -p windq'
alias swork='squeue -l -p workq'
This will give you an overview of how many free computational resources there are, so you can choose the partition accordingly.
Development with Jupyter
A nice way to interactively run CFD polars and analyse them is through Jupyter notebooks. To do this on sophia you essentially need to start a batch job that runs a jupyter kernel to which you connect from your local machine. This method is essentially thanks to Kenneth Lønbæk.
So from your local machine (on Windows you can use the local terminal from MobaXterm) you connect to the login node and are forwarding port 8890 (you can change the port number):
ssh -L 8890:localhost:8890 <user>@sophia.dtu.dk
Once on the login node you launch the runjupy.sh slurm job anywhere you like; an alias has
automatically been created in your .bashrc during installation, just type runjupy
. Here the runbatch
file:
#!/bin/sh
port="8890"
time="10"
queue="workq"
while :; do
case $1 in
-p|--port) port=$2
shift
;;
-t|--time) time=$2
shift
;;
-q|--queue) queue=$2
shift
;;
*) break
esac
shift
done
echo launching jupyter kernel....
echo port: $port "|" time: $time "|" queue: $queue
cat <<EOF > srunjupy.sh
#!/bin/sh
#SBATCH --job-name=jupy
#SBATCH --workdir=.
#SBATCH --output=jupy.o%j
##SBATCH --error=jupy.e%j
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=$queue
#SBATCH --distribution=plane=1
#SBATCH --time=$time:00:00
#SBATCH --exclusive
ulimit -s unlimited
echo running jupyter kernel....
echo port: $port "|" time: $time "|" queue: $queue
source $PYE2DPOLAR_PATH/build/activate_pye2dpolar.sh
# start notebook
echo \$SLURM_JOB_NODELIST > jupyinfo
echo \$SLURM_SUBMIT_DIR >> jupyinfo
jupyter notebook --no-browser --port=$port
wait
EOF
sbatch srunjupy.sh
# wait until job runs and file is written
echo "waiting for kernel to start..."
until [ -f jupyinfo ]
do
sleep 1
done
# SLURM_JOB_NODELIST=$(sed -n '1p' jupyinfo | xargs) xargs removes whitespace
# SLURM_SUBMIT_DIR=$(sed -n '2p' jupyinfo | xargs)
SLURM_JOB_NODELIST=$(sed -n '1p' jupyinfo)
SLURM_SUBMIT_DIR=$(sed -n '2p' jupyinfo)
rm jupyinfo
echo "ssh -t -L $port:localhost:$port $SLURM_JOB_NODELIST 'cd $SLURM_SUBMIT_DIR; bash --login'"
ssh -t -L $port:localhost:$port $SLURM_JOB_NODELIST 'cd '$SLURM_SUBMIT_DIR'; bash --login'
rm srunjupy.sh
There are some defaults for the port, queue and time you wish to run the kernel for, but you can change those
through command line inputs ie runjupy -t 5 -q workq -p 8891
. Once launched (you might need to wait for a node to free up)
.. This script also automatically creates a ssh connection script in your submit directory. Just launch it
it directly connects you to the node that is running your notebook and forwards the port you specified. Now you should be able to connect to the notebook through your web browser, just load your local port by typing the following in the address field:
http://localhost:8890/
The startup of the notebook might take a little, so be patient whilst it throwing some error messages in your terminal, they will disappear once the notebook is running. If you need the access token have a look in the slurm run output jupy.o<run_id> inside the launch folder, there you can also check whether the kernel is running as expected.
Gbar (DTU Central)
Information about the Gbar cluster can be found here: https://www.hpc.dtu.dk/?page_id=2534
including information on troubleshooting etc, (ssh -X <user>@login1.gbar.dtu.dk
). As for Sophia, never run jobs on the login
nodes! Also read the above text for Sophia as there is a lot of overlap between the two.
One can request an interactive node, where you can do all your work, with:
linuxsh
This will also ensure that you will move directly to the interactive node.
When submitting jobs here you need to set the queue=hpc
(if not the code will do it for you).
Here you can run your scripts with simply:
python your_script.py
If you would like to see the progress of your jobs or the available computer resources refer to the detailed documentation here: https://www.hpc.dtu.dk/?page_id=1519.