Marine Omics Variant Pipeline

Designed to process data from raw reads through to vcf.

graph TD;
	fastq-->fastqc;
	fastqc-->multiqc;
	fastq-->fastp;
	fastp-->multiqc;
	fastp-->bwa_mem;
	genome-->bwa_index;
	bwa_index-->bwa_mem;
	bwa_mem-->gatk_mark_duplicates;
	gatk_mark_duplicates-->samtools_stats;
	samtools_stats-->multiqc;
	gatk_mark_duplicates-->freebayes;
	gatk_mark_duplicates-->bcftools_mpileup;

Quick Start

Install nextflow. If working on JCU infrastructure please see these detailed instructions
Run a test to make sure everything is installed properly. The command below should work on a linux machine with singularity installed (eg JCU HPC).

nextflow run marine-omics/movp -latest -profile singularity,test -r main

If you are working from a mac or windows machine you will need to use docker.

nextflow run marine-omics/movp -latest -profile docker,test -r main

Create the sample csv file (example below)

sample,fastq_1,fastq_2
1,sample1_r1.fastq.gz,sample1_r2.fastq.gz
2,sample2_r1.fastq.gz,sample2_r2.fastq.gz

Paths should either be given as absolute paths or relative to the launch directory (where you invoked the nextflow command)

Choose a profile for your execution environment. This depends on where you are running your code. movp comes with preconfigured profiles that should work on JCU infrastructure and pawsey/setonix. These are
- JCU HPC (ie zodiac) : Use -profile zodiac
- genomics12 (HPC nodes without pbs): Use -profile genomics
- setonix: Use -profile setonix and set your slurm account with --slurm_account pawseyXXXX

If you need to customise further you can create your own custom.config file and invoke with option -c custom.config. See nextflow.config for ideas on what parameters can be set.

Run the workflow with your genome and samples file

nextflow run marine-omics/movp -profile singularity,zodiac -r main --genome <genomefile> --samples <samples.csv> --outdir myoutputs

Installing Nextflow on a system with an old java version.

Our JCU HPC systems are still running java 8 but nextflow requires 11 or newer. One way around this is to use sdkman to install and manage a different java version. This is now the preferred way to install java for nextflow (See instructions here.

Troubleshooting

Docker image

When running for the first time nextflow will need to download the docker image from dockerhub and convert it to a singularity image. This can be slow, and nextflow doesn't make it easy to monitor progress. If this step is failing you can try downloading the image separately yourself.

First make sure you set your NXF_SINGULARITY_CACHEDIR variable to a path where you can permanently store the singularity images required by movp. For example to put it .nxf/singularity_cache in your home directory you would do;

mkdir ~/.nxf/singularity_cache
export NXF_SINGULARITY_CACHEDIR=${HOME}/.nxf/singularity_cache

This will create the directory and set the value of NXF_SINGULARITY_CACHEDIR for your current login session. To make this setting permanent you should add the export command shown above to your .bash_profile

Next pull the image from dockerhub. This command will download the image, convert to singularity format and place it in your previously defined NXF_SINGULARITY_CACHEDIR. Note that this command is specific for container version 0.4.

singularity pull  --name ${NXF_SINGULARITY_CACHEDIR}/iracooke-movp-0.4.img docker://iracooke/movp:0.4

Customising resource usage

The default resource limits for individual processes are often going to need tweaking for individidual projects. This can be done fairly easily by creating a custom config file.

For example, if you want to increase memory and cpu requests for the bwa_mem_gatk and gatk_mark_duplicates steps you would create a custom config as follows

process {
	withName: 'bwa_mem_gatk'{
		cpus=12
		memory=10.GB
	}
	withName: 'gatk_mark_duplicates'{
		cpus=12
		memory=30.GB
	}
}

Save this into a file called local.config and then run tell nextflow to use it with the -c option as follows

nextflow run marine-omics/movp -latest -profile singularity,zodiac -r main <genomefile> --samples <samples.csv> --outdir myoutputs -c local.config

When running on the JCU HPC jobs will be submitted to the queuing system, which is PBS Pro. Options available to set are described here.

Running in the background

If your workflow will take a long time you may want to run it in the background. This will ensure that the workflow continues even if you logout. To do this simply add the -bg option. Once the workflow is running in the background you can check progress using

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
conf		conf
modules		modules
shell		shell
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README_Dockerbuild.md		README_Dockerbuild.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Marine Omics Variant Pipeline

Quick Start

Installing Nextflow on a system with an old java version.

Troubleshooting

Docker image

Customising resource usage

Running in the background

About

Releases

Packages

Languages

marine-omics/movp

Folders and files

Latest commit

History

Repository files navigation

Marine Omics Variant Pipeline

Quick Start

Installing Nextflow on a system with an old java version.

Troubleshooting

Docker image

Customising resource usage

Running in the background

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages