Skip to content

h3agwas design principles

Scott.Hazelhurst edited this page May 26, 2017 · 2 revisions

The goal of the H3ABionet GWAS pipeline is to provide a portable and robust pipeline for reproducble genome-wide association studies.

A GWAS requires a complex set of analyses with complex dependancies between the analyses. We want to support GWAS work by supporting

  • reproducibility -- we can rerun the entire analysis from start to finish;
  • reusability -- we can run the entire analysis with different parameters in an efficient and consistent way;
  • portability -- we can run the analysis on a laptop, on a server, on a cluster, in the cloud. The same workflow can be used for all environments, even if the time taken may change;

We took the following into account:

  • The anticipated users are heterogeneous both in terms of their bioinformatics needs and the computing environments they will use.
  • Each GWAS is different -- it must be customisable to allow the bioinformaticists to set different parameters.

There are two key technologies that we use, Nextflow and Docker, both of which support our design principles.

Nextflow

Nextflow is a workflow language designed at the Centre for Genomic Regulation, Barcelona. Although it is a general workflow language for science, it comes out of a bioinformmatics group and strongly supports bioinformatics.

Our pipeline is built using Nextflow. However, users do not need to know anything about Nextflow. Obviously if you can do some programming you can customise and extend the pipelines, but you do not need to know Nextflow yourself.

Nextlow is very easy to install and is highly portable. It supports partial execution and pipelines that scale. Nextflow supports our worklow requirements very well.

Docker

A GWAS requires several software tools to be installed. Using Docker we can simplify the installation. Essentially, Docker wraps up all software dependancies into containers. Instead of installing all the dependancies, you can install Docker, easily and then install our containers. (In fact you don't need to explicitly install our containers, Nextflow and our workflow will do that for you automatically).

We expect that many of our users will use Docker. However, we recognise that this won't be suitable for everyone because many high performance computing centres do not support Docker for security reasons. It is possible to run our pipeline without Docker and will give instructions about which software needs to be installed.

In a later version of the pipeline, we hope to support Singularity.