## Nextflow with SLURM Tutorial

Let's run a Nextflow pipeline.

In [1]:
module load pcluster-helpers

In [2]:
pcluster-helper --help

[1m [0m
[1m [0m[1;33mUsage: [0m[1mpcluster-helper [OPTIONS] COMMAND [ARGS]...[0m[1m [0m[1m [0m
[1m [0m
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[1;36m-completion[0m [1;2;33m[[0m[1;33mbash[0m[1;2;33m|[0m[1;33mzsh[0m[1;2;33m|[0m[1;33mfish[0m[1;2;33m|[0m[1;33mpowershe[0m Install completion for [2m│[0m
[2m│[0m [1;33mll[0m[1;2;33m|[0m[1;33mpwsh[0m[1;2;33m][0m[1;33m [0m the specified shell. [2m│[0m
[2m│[0m [2m[default: None] [0m [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m [1;2;33m[[0m[1;33mbash[0m[1;2;33m|[0m[1;33mzsh[0m[1;2;33m|[0m[1;33mfish[0m[1;2;33m|[0m[1;33mpowershe[0m Show completion for the [2m│[0m
[2m│[0m [1;33mll[0m[1;2;33m|[0m[1;33mpwsh[0m[1;2;33m][0m[1;33m [0m specified shell, to [2m│[0m
[2m│[0m copy it or customize [2m│[0m
[2m│[0m the installation. [2m│

### Generate a Nextflow slurm.config

We'll use the `pcluster-helper gen-nxf-slurm-config` in order to generate a default slurm configuration file.

In [3]:
pcluster-helper gen-nxf-slurm-config --help

[1m [0m
[1m [0m[1;33mUsage: [0m[1mpcluster-helper gen-nxf-slurm-config [OPTIONS][0m[1m [0m[1m [0m
[1m [0m
 Generate a slurm.config for nextflow that is compatible with your cluster. 
 [2mYou will see a process label for each partition/node type.[0m 
 [2mUse the configuration in your process by setting the label to match the label [0m 
 [2min the config.[0m 
 
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-include[0m[1;36m-memory[0m [1;35m-[0m[1;35m-no[0m[1;35m-include-memory[0m [1;33m [0m Include scheduleable [2m│[0m
[2m│[0m memory [2m│[0m
[2m│[0m [2m[default: [0m [2m│[0m
[2m│[0m [2mno-include-memory] [0m [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-scheduleable[0m[1;36m-memory[0m [1;33mFLOAT[0m Schedulable amount of [2m│[0m
[2m│[0m memory. Default is [2m│[0m
[2m│[0m 95% [2m│[0m
[2m│[0m [2m[default: 0.95] [0m [2m│[0m
[2m│[0m [

### One time Nextflow SLURM Setup


If you want Nextflow to distribute your jobs using the SLURM cluster you'll need to generate a SLURM executor config that Nextflow understands.

You can generate this once, and continue to use it for all Nextflow pipelines.

```bash
pcluster-helper \
 gen-nxf-slurm-config \
 --include-memory \
 --output ~/slurm.config \
 --overwrite
```

We'll also want to create a default configuration for jobs that don't have a process tag. I'll choose a small one for this demonstration, but you should choose which instance is best for your workflows.

In [4]:
cat > ./slurm-default.config <<'EOF'
process {
 executor='slurm'
 queue = 'dev'
 cpus = 8

 memory = '30GB'
 clusterOptions = '--exclusive --constraint m5a2xlarge'
}
EOF

In [5]:
# cleanup
rm -rf test.config
rm -rf test.config.*
rm -rf samplesheet_test.csv
rm -rf samplesheet_test.csv.*
#rm -rf .nextflow
#sleep 1m
#rm -rf .nextflow*
#rm -rf work
#rm -rf results

In [6]:
wget --quiet https://raw.githubusercontent.com/nf-core/rnaseq/master/conf/test.config

In [7]:
wget --quiet https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.4/samplesheet_test.csv
cat samplesheet_test.csv |wc -l

2


In [8]:
head -n 3 samplesheet_test.csv 

sample,fastq_1,fastq_2,strandedness
WT_REP1,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_1.fastq.gz,s3://nf-core-awsmegatests/rnaseq/input_data/minimal/GSE110004/SRR6357070_2.fastq.gz,reverse


### Reduce the number of samples

The samplesheet has 8 rows, and I don't want to actually run 8. I'll run the first sample.

In [9]:
cat samplesheet_test.csv |head -n 2 > samplesheet_test_t.csv ; mv samplesheet_test_t.csv samplesheet_test.csv

In [10]:
module load nextflow

In [11]:
#nextflow -h

In [12]:
#nextflow run -h

In [13]:
export NXF_ANSI_LOG="False"
export NXF_CONDA_CACHEDIR="${HOME}/.conda/envs"
export NXF_SINGULARITY_CACHEDIR="${HOME}/.singularity"

module load nextflow
module load singularity-3.8.5-gcc-7.3.1-g2xhg2m

nextflow \
 run \
 nf-core/rnaseq \
 -with-trace \
 -with-report \
 -with-dag \
 -w ./work \
 --input ./samplesheet_test.csv \
 -resume \
 -profile slurm \
 -c test.config \
 -c slurm-default.config \
 -c ~/slurm.config \
 --outdir ./results 

exit 0

N E X T F L O W ~ version 22.10.1
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `https://github.com/nf-core/rnaseq` [jovial_bohr] DSL2 - revision: e049f51f02 [master]


-[2m----------------------------------------------------[0m-
 [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m ___ __ __ __ ___ [0;32m/,-._.--~'[0m
[0;34m |\ | |__ __ / ` / \ |__) |__ [0;33m} {[0m
[0;34m | \| | \__, \__/ | \ |___ [0;32m\`-._,-`-,[0m
 [0;32m`._,._,'[0m
[0;35m nf-core/rnaseq v3.9[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
 [0;34mrevision : [0;32mmaster[0m
 [0;34mrunName : [0;32mjovial_bohr[0m
 [0;34mcontainerEngine : [0;32msingularity[0m
 [0;34mlaunchDir : [0;32m/home/jillian/bioanalyze-hpc/docs/workflow-managers/nextflow[0m
 [0;34mworkDir : [0;32m/home/jillian/bioanalyze-hpc/docs/workflow-managers/nextflow/work[0m
 [0;34mprojectDir : [0;32m/home/jillian/.nextflow/assets/nf-core/

In [14]:
tree results

results
├── bbmap
│   ├── bbsplit
│   │   └── ref
│   │   ├── genome
│   │   │   └── 1
│   │   │   ├── chr1.chrom.gz
│   │   │   ├── info.txt
│   │   │   ├── merged_ref_8374379829187813017.fa.gz
│   │   │   ├── namelist.txt
│   │   │   ├── reflist.txt
│   │   │   ├── scaffolds.txt.gz
│   │   │   └── summary.txt
│   │   └── index
│   │   └── 1
│   │   ├── chr1_index_k13_c12_b1.block
│   │   └── chr1_index_k13_c12_b1.block2.gz
│   ├── WT_REP1_human_1.fastq.gz
│   ├── WT_REP1_human_2.fastq.gz
│   ├── WT_REP1_primary_1.fastq.gz
│   ├── WT_REP1_primary_2.fastq.gz
│   ├── WT_REP1_sarscov2_1.fastq.gz
│   ├── WT_REP1_sarscov2_2.fastq.gz
│   └── WT_REP1.stats.txt
├── deseq2
│   └── deseq2.dds.RData
├── fastqc
│   ├── WT_REP1_1_fastqc.html
│   ├── WT_REP1_1_fastqc.zip
│   ├── WT_REP1_2_fastqc.html
│   └── WT_REP1_2_fastqc.zip
├── multiqc
│   └── star_salmon
│   ├── multiqc_data
│   │   ├── junction_saturation_known.txt
│   │   ├── junction_saturation_novel.txt
│   │   ├── mqc_cutadapt_filtered_r

In [15]:
# cleanup
rm -rf .nextflow
#sleep 1m
#rm -rf .nextflow*

rm: cannot remove ‘.nextflow/cache/9ac749c4-d518-411e-906f-d39380596ec7/db’: Directory not empty
rm: cannot remove ‘.nextflow/cache/9ac749c4-d518-411e-906f-d39380596ec7/db/.nfs000000004fc0228a0000000c’: Device or resource busy
rm: cannot remove ‘.nextflow/cache/9ac749c4-d518-411e-906f-d39380596ec7/db/.nfs000000004fc0228d0000000d’: Device or resource busy
rm: cannot remove ‘.nextflow/cache/9ac749c4-d518-411e-906f-d39380596ec7/db/.nfs000000004fc0228e0000000e’: Device or resource busy
rm: cannot remove ‘.nextflow/cache/9ac749c4-d518-411e-906f-d39380596ec7/.nfs000000004f8009ea0000000f’: Device or resource busy


: 1