Skip to content

Commit

Permalink
Serve arbitrary md pages
Browse files Browse the repository at this point in the history
  • Loading branch information
neoformit committed Sep 14, 2023
1 parent 4ecac9f commit f0836ca
Show file tree
Hide file tree
Showing 5 changed files with 178 additions and 4 deletions.
11 changes: 11 additions & 0 deletions webapp/home/static/home/css/markdown.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
h1, h2, h3, h4, h5, h6 {
margin-top: 4rem;
}
h3, h4, h5, h6 {
margin-top: 2rem;
}
h2 {
margin-bottom: 2rem;
padding-bottom: .5rem;
border-bottom: 1px solid var(--color);
}
13 changes: 13 additions & 0 deletions webapp/home/templates/home/markdown-page.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% extends 'home/header.html' %}
{% load markdown %}
{% load static %}

{% block head %}
<link rel="stylesheet" href="{% static 'home/css/markdown.css' %}">
{% endblock %}

{% block content %}
<main>
{{ md_text|markdown|safe }}
</main>
{% endblock %}
124 changes: 124 additions & 0 deletions webapp/home/templates/home/pages/vgp-workflows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# How to use the international Vertebrate Genome Project workflows in Galaxy Australia

Anna Syme

## What are the VGP workflows?

* These are workflows for genome assembly, developed for the Vertebrate Genome Project
* Website: https://vertebrategenomesproject.org/
* Data at Genome Ark https://vgp.github.io/genomeark/
* Paper: Rhie, A., McCarthy, S.A., Fedrigo, O. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). https://doi.org/10.1038/s41586-021-03451-0
* This paper covers the testing of tools and workflows. We recommend also looking at the Supplementary Information which is very informative.

## Are the VGP workflows in Galaxy Australia?

* An international team is working to create these workflows in Galaxy: for more information see [https://galaxyproject.org/projects/vgp/](https://galaxyproject.org/projects/vgp/)
* In Galaxy Australia, you can access a set of these workflows by going to the [Genome Lab](https://genome.usegalaxy.org.au/), scroll to the Genome Assembly section, click on the Workflows tab.
* The workflows to import are:
* Assembly with PacBio HiFi data:
* BAM to FASTQ + QC v1.0
* Assembly with PacBio Hifi and HiC data:
* Kmer profiling
* Hifi assembly and HiC phasing
* HiC scaffolding
* Decontamination

## Can I use the VGP workflows in Galaxy Australia?

* Yes. You can run these on test data or real data. They are designed to work for vertebrate genomes where you have PacBio Hifi data and HiC data.
* The workflows are described in Galaxy Training Network materials.
* [VGP assembly pipeline - short version, although still with complete workflow](https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_workflow_training/tutorial.html)
* [VGP assembly pipeline - long version](https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html)

## What data do I need?

Overall, you need these inputs:
* HiFi reads as collection
* If HiFi data in BAM format, convert to FASTQ using the BAM to FASTQ + QC v1.0 workflow
* then group output FASTQ files into a single collection
* HiC reads as R1 and R2

For each of the other workflows, you will need these inputs:

* WF1 Kmer profiling
* Inputs:
* HiFi reads in collection
* WF4 Hifi assembly and HiC phasing
* Inputs:
* HiFi reads in collection
* HiC R1, HiC R2,
* from WF1: genomescope model parameters, genomescope summary, mery db
* WF8a HiC scaffolding
* Inputs:
* From WF4: Assembly in gfa format
* From WF4: estimated genome size
* HiC R1, Hi C R2
* Settings:
* For restriction enzymes: set correctly
* For Input GFA: Generates the initial set of paths: set true (if using assembly from Hifiasm)
* WF9 Decontamination
* Inputs:
* From WF8a: Scaffolded assembly in FASTA format
* Settings:
* For step "ID non-target contaminants": select Kraken 2 database : choose: Prebuilt Refseq indexes: PlusPF (Standard plus protozoa and fungi)
* For step "blast mitochondria db": choose locally installed blast database: choose RefSeq mitochondrion

## Do I need any background knowledge before I run these workflows?

* We recommend the Galaxy Training Network tutorials.
* Introduction to Galaxy: https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html
* Assembly: https://training.galaxyproject.org/training-material/topics/assembly/tutorials/general-introduction/tutorial.html
* QC: https://training.galaxyproject.org/training-material/topics/sequence-analysis/tutorials/quality-control/tutorial.html
* VGP: https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html
* These should be enough to get you started running the VGP workflows, but there are many additional tutorials that would also be useful.
* As most species have never had their genome sequenced, it is not possible to guarantee existing workflows are optimal for new data.
* It is most likely that any new genome assembly will have its own set of required workflow and analysis customisations to account for things such as ploidy and repeats.
* Usually, an assembly workflow will need testing and customising, in concert with reading the biological domain literature.

## What is going on in the workflows?

* The workflows are large and have many steps.
* To better understand the workflow, look at the workflow canvas, view each tool and its settings, see how each step is connected.
* Consider what outputs you would require and make sure it is clear where they are and what they are called.
* For a description of many of the tools and the default parameters that have been set, see the [Galaxy tutorial for VGP workflows](https://training.galaxyproject.org/training-material/topics/assembly/tutorials/vgp_genome_assembly/tutorial.html).

## Run the workflows on test data

* Look at the outputs and see if it makes sense and all tools ran.

## Import real data

* If you will be using real vertebrate genome data, it is likely you will need more Galaxy storage. Contact the Galaxy Australia team to discuss/request.
* Import your real data sets into Galaxy.
* Or, to use real VGP data, go to **Upload Data: Choose remote files: Genome Ark: species** and choose a species.
* Note: not all species have data from all Hifi and HiC sources.
* Note that you will likely have to convert the data into the correct formats required. Alternatively, modify the workflows themselves to accept the data in the formats you have.

## Is there anything I need to change in the workflows?

* Most likely, yes. Look at every tool and the parameters to check it suits your data and aims.
* Examples of things that might need changing:
* Set the kmer size in the meryl tool, and make sure the setting in the Genomescope tool matches this. There is no absolute answer for what this setting should be; a common setting is 21.
* In the Quast tool (used several times), set the lineage appropriately (e.g. eukaryote) and set as "large genome"
* In the Busco tool (used several times), set the lineage appropriately (e.g. Vertebrata)

## Do I need to keep all of the output files from the workflow?

* No. In particular, some of the intermediate output files are very large, e.g., BAM files. You may wish to not keep these.
* To automatically not keep these files during the workflow run: go to the workflow canvas, see the box for a tool, see the output files, and untick any files you will not need.

## Run the workflows on real data

* Modify workflows as needed: label or tag outputs, change tool parameters, swap tools, add or delete steps.
* For large data we would recommend testing tools and workflows on a subset of the data first.

## Where to see the outputs

* Your Galaxy history holds the outputs from the workflow run.
* Some files may be hidden - click on "show hidden" to see.
* In the top Galaxy menu, go to User: Workflow invocations to see details of the workflow run.

## What to do if a tool or workflow fails?

* Read the error message to see if you can troubleshoot the issue. Otherwise, contact [email protected]
* If you are able to, re-run the tool or workflow in case it was a temporary connection issue.
4 changes: 2 additions & 2 deletions webapp/home/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@
api.subdomain_feedback,
name="subdomain_feedback"),

# Arbitrary HTML pages
re_path(r'^[\w\d\_-]+.html$', views.page, name='home_page'),
# Arbitrary *.html / *.md pages
re_path(r'^[\w\d\_-]+\.(?:html|md)$', views.page, name='html_pages'),

# Redirects
path('galaxy', redirects.homepage, name="redirect_home"),
Expand Down
30 changes: 28 additions & 2 deletions webapp/home/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import logging
from django.conf import settings
from django.template import TemplateDoesNotExist
from django.template import TemplateDoesNotExist, loader
from django.shortcuts import render, get_object_or_404
from django.http import Http404
from django.template.loader import get_template
Expand Down Expand Up @@ -150,17 +150,33 @@ def user_request_resource_access(request, resource):
return render(request, template, {'form': form})


def page(request):
def page(request, *args):
"""Serve an arbitrary static page."""
logger.warning(args)
template = f'home/pages/{request.path}'
templates_dir = os.path.join(
settings.BASE_DIR,
'home/templates/home/pages')
if os.path.basename(template) not in os.listdir(templates_dir):
raise Http404
if request.path.endswith('.md'):
return md_page(request, template)
return render(request, template)


def md_page(request, template):
"""Return a markdown file rendered to HTML."""
md_path = (
loader.get_template(template)
.origin.name
)
with open(md_path) as f:
markdown = f.read()
return render(request, 'home/markdown-page.html', {
'md_text': markdown,
})


def aaf_info(request):
"""Show current list of AAF institutions."""
return render(request, 'home/aaf-institutions.html', {
Expand All @@ -173,3 +189,13 @@ def australian_institutions(request):
return render(request, 'home/au-institutions.html', {
'institutions': get_institution_list(),
})


def vgp_workflows(request):
"""Show VGP workflows."""
path = loader.get_template('home/docs/vgp-workflows.md').origin.name
with open(path) as f:
text = f.read()
return render(request, 'home/docs/vgp-workflows.html', {
'md_text': text,
})

0 comments on commit f0836ca

Please sign in to comment.