Step 6: Test on Biowulf HPC

When running on Biowulf HPC, we need to configure settings specific to that environment—most importantly, specifying which executor to use.

This is typically done using profiles in the nextflow.config file. A profile allows you to bundle environment-specific settings like the executor type, memory and CPU allocations, and other resource or cluster-specific configurations. You can activate a profile using the -profile option when launching a pipeline.

On Biowulf, the key setting is:

executor = 'slurm'

Let’s break down why this matters.


✅ What happens when executor = 'slurm' is set?

When you specify executor = 'slurm', Nextflow:

  • Submits jobs using sbatch
  • Translates process directives (cpus, memory, time) into #SBATCH options
  • Automatically manages cluster queue behavior (submission rate, polling interval, retries, etc.)
  • Ensures each process runs on a compute node, not the local machine

This setup is essential for proper resource allocation and performance on clusters like Biowulf.


❌ What happens if executor is not set?

If you don’t specify an executor, Nextflow defaults to:

executor = 'local'

Which means:

  • All processes run on the same machine where nextflow run is executed
  • No jobs are submitted to Slurm
  • Resource directives like --mem, --cpus, and --time are not interpreted as SLURM constraints
  • You’re simply multithreading on a single node—either in an interactive session or within an sbatch-wrapped shell

This is not what you want on an HPC cluster like Biowulf.


How and where are resources set?

Just like the executor, resources like CPU, memory, and time are set through configuration files and Nextflow process labels.

For example, in manta/germline/main.nf:

process MANTA_GERMLINE {
    tag "$meta.id"
    label 'process_medium'
    label 'error_retry'
}

The label process_medium refers to this block in conf/base.config:

withLabel:process_medium {
    cpus   = { 6     * task.attempt }
    memory = { 36.GB * task.attempt }
    time   = { 8.h   * task.attempt }
}

This setup dynamically scales resources with retry attempts.


Biowulf Support in nf-core/configs

Biowulf is one of the many clusters pre-configured in the nf-core/configs repository. This means you don’t have to define Biowulf-specific settings yourself.

You can simply run:

nextflow run main.nf -c nextflow.config -profile biowulf

Nextflow will automatically pull the Biowulf configuration from:

🔗 https://github.com/nf-core/configs/blob/master/conf/biowulf.config

This profile sets executor = 'slurm', defines staging behavior, handles file caching, and more—ready to use out of the box.

For more info, see:

🔗 Biowulf-specific Nextflow docs at NIH

🔗 nf-core/configs homepage


⚠️ Naming Conflicts When Adding Your Own Profiles

If you define your own biowulf profile in your local nextflow.config, it may conflict with the official one from nf-core/configs. That’s because Nextflow prioritizes remote profiles when you use -profile.

To avoid this, use a different name for custom configs:

profiles {
  biowulf_custom {
    process.executor = 'slurm'
    // your overrides here
  }
}

Then launch with:

nextflow run main.nf -c nextflow.config -profile biowulf_custom

How to Execute

Create a batch input file (e.g. run_manta_on_biowulf.sh) to run the master Nextflow process. For example:

#! /bin/bash
#SBATCH --job-name=nextflow-main
#SBATCH --cpus-per-task=4
#SBATCH --mem=4G
#SBATCH --gres=lscratch:200
#SBATCH --time=24:00:00

module load nextflow
export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
export TMPDIR=/lscratch/$SLURM_JOB_ID
export NXF_JVM_ARGS="-Xms2g -Xmx4g"

nextflow run main.nf -c nextflow.config -profile conda,biowulf

Submit the job using sbatch:

sbatch nf_main.sh

Note: Find more details specific to using Nextflow on Biowulf HPC, see the official documentation at: https://hpc.nih.gov/apps/nextflow.html


A template submission script, run_manta_on_biowulf.sh, is also included in the nci-dceg-flowiq directory of this repo for your reference.