Exomiser

Pipeline documentation

Table of contents

Pipeline documentation

Pipeline description

Pipeline overview

Name: exomiser-pipeline-nf
Tools: exomiser
Version: 12.1.0

It is a fully containerised nextflow pipeline that runs exomisers on either a single sample VCF file or a trio VCF file.

The Exomiser is a tool to perform genome-wide prioritisation of genomic variants including non-coding and regulatory variants using patient phenotypes as a means of differentiating candidate genes.

To perform an analysis, Exomiser requires the patient's genome/exome in VCF format and their phenotype encoded in HPO terms. The exomiser is also capable of analysing trios/small family genomes.

The main input of the pipeline (families_file) is a TSV file and the main output of the pipeline is an HTML file containing pathogenicity score of the called variants.

Input

--families_file

This is a TSV file that contains the following info tab separated

run_id	proband_id	hpo	vcf_path	vcf_index_path	proband_sex	mother_id	father_id

The vcf_path column can contain the path to either a multiVCF(trio) or a single-sample VCF. In the case of a single-sample VCF, the last 2 columns must contain nan as a value. An example can be found here

In the hpo column, multiple comma-separated HPO terms can be present.

--application_properties

This is a file needed by exomiser to run. It contains information on where to find the reference data as well as the versioning of the reference genome. An example can be found here

--auto_config_yml

This is a file needed by exomiser to run. It contains placeholders in the text that get filled in by the second process of the pipeline just before running exomiser. The one used for testing can be found here

--exomiser_profile

This is a parameter that defines the kind of reference data. It accepts "test" or "full".

The "full" profile points to the reference data bundle needed by exomiser (~120 GB!). A copy of such files can be found here . The reference dataset has been added as a parameter, allowing flexibility to pull the data from any resource (i.e. cloud, local storage, ftp, ...) and Nextflow will automatically take care of fetching the data without having to add anything to the pipeline itself.

The "test" profile points to some mock data used in testing.

There are other parameters that can be tweaked to personalize the behaviour of the pipeline. These are referenced in nextflow.config

Processes

Here is the list of steps performed by this pipeline.

process ped_hpo_creation - this process produces the pedigree (PED) file needed for exomiser to run using a python script.
process exomiser - this process is where the autoconfig file for exomiser is generated and exomiser is run.

Output

a html and a json file containing a report on the analysis
the autoconfig file, for reproducibility purpose
a vcf file with the called variants that are identified as causative

Usage

The pipeline can be run like:

nextflow run main.nf --families_file 's3://lifebit-featured-datasets/pipelines/exomiser-nf/fam_file.tsv' \
        --prioritisers 'hiPhivePrioritiser' \
        --exomiser_data 's3://lifebit-featured-datasets/pipelines/exomiser-data-bundle' \
        --application_properties 's3://lifebit-featured-datasets/pipelines/exomiser-nf/application.properties' \
        --auto_config_yml 's3://lifebit-featured-datasets/pipelines/exomiser-nf/auto_config.yml'

Testing

To run the pipeline with docker (used by default), type the following commands:

To test the pipeline on a multi-VCF:

nextflow run main.nf -profile test_full_family

or

nextflow run main.nf -profile test_full_multi_hpo

To test the pipeline on a single-sample VCF:

nextflow run main.nf -profile test_full_single_vcf

Be careful when running this, as the pipeline requires the staging of 120 GB of reference data, required by exomiser, so only that takes a while!

Running on CloudOS

Profiles

profile name	Run locally	Run on CloudOS	description
test_full_family	the data required is so big, it was tested on a c5.4xlarge EC2 machine	Successful	this test is designed to test the pipeline on a multi-VCF with trio information
test_full_single_vcf	the data required is so big, it was tested on a c5.4xlarge EC2 machine	Successful	this test is designed to test the pipeline on a single-sample-VCF
test_full_multi_hpo	the data required is so big, it was tested on a c5.4xlarge EC2 machine	Successful	this test is designed to test the pipeline on a multi-VCF with trio information using multiple HPO terms

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
assets		assets
bin		bin
conf		conf
containers		containers
docs/exomiser-cli-12.1.0		docs/exomiser-cli-12.1.0
testdata		testdata
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exomiser

Exomiser

Pipeline documentation

Pipeline description

Pipeline overview

Input

--families_file

--application_properties

--auto_config_yml

--exomiser_profile

Processes

Output

Usage

Testing

Running on CloudOS

Profiles

About

Releases 2

Contributors 5

Languages

lifebit-ai/exomiser-pipeline-nf

Folders and files

Latest commit

History

Repository files navigation

Exomiser

Exomiser

Pipeline documentation

Pipeline description

Pipeline overview

Input

--families_file

--application_properties

--auto_config_yml

--exomiser_profile

Processes

Output

Usage

Testing

Running on CloudOS

Profiles

About

Resources

Stars

Watchers

Forks

Releases 2

Contributors 5

Languages