This sections provides guidance on using workflow management systems such as
on the M3 Slurm Cluster.
As we are not experts on the listed tools, please tell us if you find information to be insufficient or false.
Each of the systems mentioned above uses a head process that starts a workflow of dependent and independent tasks.
These tasks can then be run (automatically) within Slurm jobs.
The main requirements of head processes are
and it is therefore always advisable to run interactive workflow sessions within screen
or tmux
, as those can be
Please also note that most bioinformatic software is not parallelizable over multiple (independently started) processes. The associated compute tasks should therefore use the Slurm defaults
--ntasks 1
and--nodes 1
, unless you are absolutely sure that something like MPI is supported.
If the whole workflow is short (<48h) and does not collide with our regularly Monday morning maintenance window the head process can (and should) be started directly on the login node.
Note that if you start a lot of workflows from the login nodes you will eventually max out your local cpu-time or memory limitations and should also consider using Slurm allocations/jobs for your head process.
Only if using the login nodes is not advised, use
salloc --partition=cpu3-long --job-name="workflow-XYZ" --no-shell --time $DAYS-$HOURS
to launch a suitable interactive allocation for the head process. Alternatively you can also use
sbatch --partition=cpu3-long --time $DAYS-$HOURS workflow-XYZ.sh
to start a head process from a non-interactive job (assuming a suitable job script).
In either case replace $DAYS
and $HOURS
as required and workflow-XYZ
with something descriptive for you.
Contact us if your whole workflow needs longer than 14 days: the maximum walltime of the cpu3-long
partition.
Depending on which workflow manager is used the allocation might need slighty more CPUs or memory. In the salloc/sbatch
call use
--cpus-per-tasks $CPUS
to adjust the number of available cpu-threads. $CPUS
could be 2 (default), 4 or 6.--mem $MEM
to increase the amount of allocated memory. This is only necessary if the head job performs memory intensive setups, like download/creating container images, etc. $MEM
should rarely exceed 20G
or 30G
.Anything above the mentioned limits is probably excessive, does not help and will in the worst case negatively impact you and/or others.
Please remember that this allocation is not for the whole workflow, just for its head process. This also means that
--ntasks
and--gres
should not be used, as their respective defaults1
and undefined are the required values. Each workflow manager provides its own independent way of specifying resources for the actual compute tasks.
The final, interactive allocation can then be used as described above to launch a head process within a screen
or tmux
session (within a Slurm allocation).
Both can speed up roling out compute node updates and considerably help us dealing with hardware problems on isolated nodes.
This section is about custom Nextflow parameters for the M3 Slurm Cluster.
If you are interested in its general usage please look at the linked documation above.
TL;DR - The public M3 Cluster profile provided by nf-core, should provide reasonable defaults for most workflows (pipelines). To use it simply always call
nextflow run <your script> -profile m3c [-disable-jobs-cancellation] [-resume]
Nextflow pipeline processes are generally resource aware and will automatically request suitable values and adjust them and resubmit if they turn out to be insufficient.
See
nextflow run
arguments).if you want to customize Nextflow to the M3 Cluster yourself.
This section is about custom Snakemake parameters for the M3 Slurm Cluster.
If you are interested in its general usage please look at the linked documation above.
The most natural way of letting (the latest version of) Snakemake utilize Slurm seems to be to install and use the slurm executor plugin as described on https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html.
Most notable are that resource/job requirements can be assigned to jobs, that they can take dynamic functions as values (see here ) and that global limits and defaults can be set via parameters such as
--jobs
/--cores
number of jobs/cores maximally used in parallelel--default-resources [NAME=INT ...]
default resources requirement if undefined in a ruleMore possibilities are described on https://snakemake.readthedocs.io/en/stable/executing/cli.html#execution.
Should you need to use local storage please use the following option values:
--local-storage-prefix $SCRATCH
on the login node and /tmp
if the head process is run on a compute node (like described above).--remote-job-local-storage-prefix /tmp
We furthermore recommend using Apptainer where possible (if your rules support them). See https://snakemake.readthedocs.io/en/stable/executing/cli.html#apptainer/singularity for the corresponding Snakemake options.
This section is about custom bioBakery parameters on the M3 Slurm Cluster.
If you are interested in its general usage please look at the linked documation above.
See https://github.com/biobakery/biobakery_workflows?tab=readme-ov-file#parallelization-options
on how to use Slurm for running a bioBakery workflow.
Most notable are the options
--grid slurm
to select Slurm as grid engine,--grid-jobs <number>
to specify the maximum number of parallel tasks and--partition <partition>
to select a suitable partition for running the tasksFurthermore look at https://github.com/biobakery/anadama2?tab=readme-ov-file#run-in-a-grid-computing-environment on how to allow tasks to use the grid and specify their individual resource requirements.
If you need specific software let us know.
If it is available and can be maintained with reasonable effort, we can install it globally.
is installed on GPU nodes.
Currently only version 11.8 is installed. Let us know if you are interested in other versions.
OpenJDK java runtime version 11 and 17 (default) are installed.
It is possible to work with Jupyter notebooks by using an Appptainer container such as docker://quay.io/jupyter/base-notebook
or some other personalized environment.
To actually start a Jupyter notebook on a compute node use the following jupyter-notebook
options within a srun
call or within a Slurm job that is submitted by sbatch
jupyter-notebook --no-browser --port=$(( 10000+RANDOM%10000 )) --ip="$( hostname -s ).m3s"
Modified this as required: apptainer
/python3
/...
and corresponding options could be prepended and other Jupyter options appended. Be sure to add .m3s
in the --ip
option. If you are using srun
interactively please quote accordingly to make sure that hostname -s
is executed on the compute node.
After a short setup of maybe ~1min the jupyter notebook job will be ready and print something like
[I 2024-12-31 08:24:32.123 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-12-31 08:24:32.123 ServerApp] http://c001.m3s:11149/lab?token=7b6dd3ff79210123456789ab7b6dd3ff79210123456789ab
[I 2024-12-31 08:24:32.123 ServerApp] http://127.0.0.1:11149/lab?token=7b6dd3ff79210123456789ab7b6dd3ff79210123456789ab
to its output file or stdin
. Please note node name and port number from http://<node>.m3s:<port>
as well as the token. Now run
user@local:~$ ssh -N -L <port>:<node>:<port> <user>@l1.m3c.uni-tuebingen.de
on your local PC. This forwards the notebook port to your PC, after which you can use your browser to access the jupyter notebook by opening http://localhost:<port>/lab?token=<token>
.
Note that inproperly secured jupyter notebooks are a considerable security risk as they open ports that are accessible to all users and allow to impersonate you. Therefore make sure that a token is used and that it is sufficiently long and random. Also let
sbatch
jobs write their output files (which contain the token) into directories that are only readable by you, to ensure that no other unwanted user can read them.
As always
The minimal runtime R-core
is installed on all nodes.
The login nodes additionally contain R
developement packages.
Hence if you want to use custom R
modules please use the login node to install them to a shared directory, from which they can also be loaded on compute nodes.