Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variants long table CSV - clarifications #23

Open
tavareshugo opened this issue Jul 8, 2022 · 5 comments
Open

Variants long table CSV - clarifications #23

tavareshugo opened this issue Jul 8, 2022 · 5 comments
Labels
enhancement New feature or request invalid This doesn't seem right

Comments

@tavareshugo
Copy link
Collaborator

Need to clarify what some of the columns in variants_long_table.csv file indicate.
There's some inconsistencies sometimes, for example: REF_DP + ALT_DP doesn't add up; for indels, often REF_DP = ALT_DP with AF = 1.

It's also quite hard to know which of those mutations are actually part of the final consensus sequence.

Would be good to clarify these things.

@tavareshugo tavareshugo added enhancement New feature or request invalid This doesn't seem right labels Jul 8, 2022
@tavareshugo
Copy link
Collaborator Author

For example, the column FILTER includes PASS, even if a variant doesn't make it to the final consensus.
For Illumina, ivar consensus removes variants below a certain threshold of AF, but this is not indicated in that variants table.

We can see which variants are in the final consensus from the VCF files in

  • Illumina: variants/ivar/consensus/bcftools/*.filtered.vcf.gz
  • Nanopore: variants/medaka/*.pass.unique.vcf.gz (?? not sure - need to double-check)

@tavareshugo
Copy link
Collaborator Author

Possibly use bcftools merge | bcftools query to do this.

The bcftools query command used in the pipeline is here (notice it's slightly different depending on the caller).

@tavareshugo
Copy link
Collaborator Author

The strategy to bcftools merge | bcftools query doesn't really work, because:

  • After merging, samples without a variant in a given position are given ./.. It can be set to output 0/0, but that's not accurate either, because the samples may have had missing data in those positions.
  • The output of the query is not in "long" format, which is also not ideal for downstream analysis.

An alternative (which is a bit more involved) is to do the following:

# Create a shell variable with the sample names from our clean FASTA file
SAMPLES=$(grep ">" report/consensus.fa | sed 's/>//')

# Create a CSV file containing the column names of our new table
echo "sample,chrom,ref,alt" > report/variants.csv

# Use a for loop to run bcftools query on each sample
# adding the result of each iteration to the CSV file we created above
for SAMPLE in $SAMPLES
do
  bcftools query -f "${SAMPLE},%CHROM,%POS,%REF,%ALT\n" results/viralrecon/medaka/${SAMPLE}.pass.unique.vcf.gz >> report/variants.csv
done

Which results in a CSV file in long format. This is probably enough for reporting, etc.

@tavareshugo
Copy link
Collaborator Author

tavareshugo commented Oct 28, 2022

This is my current understanding about variants results:

  • --platform nanopore --> only uses filtered variants for all analysis (MultiQC, SnpEff, long table).
  • --platform illumina --> the "# SNPs" and "# INDELs" reported in the MultiQC table are for filtered variants. However, the long table and SnpEff analysis include all variants (including those that do not make it through the 0.75 threshold).
    So, if one wants a list of the actual variants that make it through to the consensus, then filter this table for AF > 0.75 and DP > 10.

I have also empirically checked this on a set of 48 samples each on Illumina and Nanopore pipelines.

@tavareshugo
Copy link
Collaborator Author

See #19 for details of where this is found in the pipeline

@tavareshugo tavareshugo changed the title Variant calling CSV Variants long table CSV - clarifications Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant