Skip to content
This repository has been archived by the owner on Aug 20, 2024. It is now read-only.

Parameters meta-schema - first draft #133

Merged
merged 4 commits into from
Nov 20, 2023
Merged

Parameters meta-schema - first draft #133

merged 4 commits into from
Nov 20, 2023

Conversation

ewels
Copy link
Member

@ewels ewels commented Nov 17, 2023

See nf-core/tools#2436

Still missing conditional validation - eg. that maxLength is only set when "type": "string"

Copy link
Collaborator

@nvnieuwk nvnieuwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😉

parameters_meta_schema.json Outdated Show resolved Hide resolved
@ewels
Copy link
Member Author

ewels commented Nov 17, 2023

Testing on all nf-core pipelines...

Get the pipeline schema files:

wget https://nf-co.re/pipelines.json

for f in *for p in $(cat pipelines.json | jq ".remote_workflows [].name" -r)
do
wget "https://raw.githubusercontent.com/nf-core/${p}/master/nextflow_schema.json" -O ${p}_schema.json
done

Validate with check-jsonschema:

for f in *schema.json
do
printf "# $f\n\n"; check-jsonschema --schemafile ../parameters_meta_schema.json $f; printf "\n\n"
done

airrflow_schema.json

Schema validation errors were encountered.
  airrflow_schema.json::$.definitions.clonal_analysis_options.properties.clonal_threshold.type: ['string', 'number'] is not of type 'string'
  airrflow_schema.json::$.definitions.clonal_analysis_options.properties.clonal_threshold.type: ['string', 'number'] is not one of ['string', 'boolean', 'integer', 'number']
  airrflow_schema.json::$.definitions.report_options.description: '' is too short

ampliseq_schema.json

Schema validation errors were encountered.
  ampliseq_schema.json::$.definitions.main_arguments.description: '' is too short
  ampliseq_schema.json::$.definitions.sequencing_input.description: '' is too short
  ampliseq_schema.json::$.definitions.read_trimming_and_quality_filtering.description: '' is too short
  ampliseq_schema.json::$.definitions.downstream_analysis.description: '' is too short
  ampliseq_schema.json::$.definitions.pipeline_report.description: '' is too short
  ampliseq_schema.json::$.definitions.skipping_specific_steps.description: '' is too short

atacseq_schema.json

ok -- validation done

bacass_schema.json

Schema validation errors were encountered.
  bacass_schema.json::$.definitions.contamination_screening.description: '' is too short
  bacass_schema.json::$.definitions.assembly_polishing.description: '' is too short
  bacass_schema.json::$.definitions.skipping_options.description: '' is too short

bactmap_schema.json

Schema validation errors were encountered.
  bactmap_schema.json::$.definitions.compulsory_parameters.description: '' is too short
  bactmap_schema.json::$.definitions.optional_pipeline_steps.description: '' is too short

bamtofastq_schema.json

ok -- validation done

cageseq_schema.json

ok -- validation done

callingcards_schema.json

Schema validation errors were encountered.
  callingcards_schema.json::$.definitions.alignment_options.description: '' is too short

chipseq_schema.json

ok -- validation done

circdna_schema.json

ok -- validation done

circrna_schema.json

ok -- validation done

clipseq_schema.json

ok -- validation done

coproid_schema.json

Schema validation errors were encountered.
  coproid_schema.json::$.definitions.pipeline_parameters.description: '' is too short

createpanelrefs_schema.json

ok -- validation done

createtaxdb_schema.json

ok -- validation done

crisprseq_schema.json

ok -- validation done

crisprvar_schema.json

Several files failed to parse.
  Failed to parse crisprvar_schema.json

cutandrun_schema.json

Schema validation errors were encountered.
  cutandrun_schema.json::$.definitions.flow_switching_options.description: '' is too short
  cutandrun_schema.json::$.definitions.trimming_options.description: '' is too short
  cutandrun_schema.json::$.definitions.pipeline_options.description: '' is too short
  cutandrun_schema.json::$.definitions.reporting_options.description: '' is too short

ddamsproteomics_schema.json

Several files failed to parse.
  Failed to parse ddamsproteomics_schema.json

deepvariant_schema.json

Several files failed to parse.
  Failed to parse deepvariant_schema.json

demultiplex_schema.json

Schema validation errors were encountered.
  demultiplex_schema.json::$.definitions.workflow_options.description: '' is too short

denovohybrid_schema.json

Several files failed to parse.
  Failed to parse denovohybrid_schema.json

detaxizer_schema.json

ok -- validation done

diaproteomics_schema.json

Schema validation errors were encountered.
  diaproteomics_schema.json::$.definitions.spectral_library_generation.description: '' is too short
  diaproteomics_schema.json::$.definitions.pseudo_irt_generation.description: '' is too short
  diaproteomics_schema.json::$.definitions.spectral_library_merging.description: '' is too short
  diaproteomics_schema.json::$.definitions.spectral_library_rt_alignment.description: '' is too short
  diaproteomics_schema.json::$.definitions.dia_spectral_library_search.description: '' is too short
  diaproteomics_schema.json::$.definitions.false_discovery_rate_estimation.description: '' is too short
  diaproteomics_schema.json::$.definitions.ms2_chromatogram_alignment.description: '' is too short
  diaproteomics_schema.json::$.definitions.output_summary.description: '' is too short

differentialabundance_schema.json

Schema validation errors were encountered.
  differentialabundance_schema.json::$.definitions.observations_options.description: '' is too short
  differentialabundance_schema.json::$.definitions.deseq2_specific_options_rna_seq_only.description: '' is too short
  differentialabundance_schema.json::$.definitions.limma_specific_options_microarray_only.description: '' is too short
  differentialabundance_schema.json::$.definitions.gsea.description: '' is too short
  differentialabundance_schema.json::$.definitions.shiny_app_settings.description: '' is too short
  differentialabundance_schema.json::$.definitions.reporting_options.description: '' is too short

dualrnaseq_schema.json

Schema validation errors were encountered.
  dualrnaseq_schema.json::$.definitions.star_salmon_alignment_based_mode.description: '' is too short
  dualrnaseq_schema.json::$.definitions.htseq_features.description: '' is too short
  dualrnaseq_schema.json::$.definitions.rna_mapping_statistics.description: '' is too short

eager_schema.json

ok -- validation done

epitopeprediction_schema.json

ok -- validation done

exoseq_schema.json

Several files failed to parse.
  Failed to parse exoseq_schema.json

fastquorum_schema.json

ok -- validation done

fetchngs_schema.json

ok -- validation done

funcscan_schema.json

ok -- validation done

genomeannotator_schema.json

Several files failed to parse.
  Failed to parse genomeannotator_schema.json

genomeassembler_schema.json

ok -- validation done

genomeskim_schema.json

ok -- validation done

gwas_schema.json

ok -- validation done

hgtseq_schema.json

Schema validation errors were encountered.
  hgtseq_schema.json::$.definitions.run_options.description: '' is too short

hic_schema.json

ok -- validation done

hicar_schema.json

ok -- validation done

hlatyping_schema.json

ok -- validation done

imcyto_schema.json

Several files failed to parse.
  Failed to parse imcyto_schema.json

isoseq_schema.json

ok -- validation done

kmermaid_schema.json

Several files failed to parse.
  Failed to parse kmermaid_schema.json

liverctanalysis_schema.json

ok -- validation done

lncpipe_schema.json

Several files failed to parse.
  Failed to parse lncpipe_schema.json

mag_schema.json

Schema validation errors were encountered.
  mag_schema.json::$.definitions.input_output_options.properties.input.format: 'file-path-pattern' is not one of ['file-path', 'directory-path', 'path']
  mag_schema.json::$.definitions.quality_control_for_short_reads_options.description: '' is too short
  mag_schema.json::$.definitions.quality_control_for_long_reads_options.description: '' is too short
  mag_schema.json::$.definitions.taxonomic_profiling_options.properties.gtdbtk_min_completeness.minimum: 0.01 is not of type 'integer'
  mag_schema.json::$.definitions.assembly_options.description: '' is too short
  mag_schema.json::$.definitions.gene_prediction_and_annotation_options.description: '' is too short
  mag_schema.json::$.definitions.binning_options.description: '' is too short
  mag_schema.json::$.definitions.bin_quality_check_options.description: '' is too short

marsseq_schema.json

ok -- validation done

mcmicro_schema.json

ok -- validation done

metaboigniter_schema.json

Schema validation errors were encountered.
  metaboigniter_schema.json::$.definitions.internal_library_quantification_and_identification_parameters_negative_mode.properties.ipo_integrate_library_neg.help_text: '' is too short

metapep_schema.json

ok -- validation done

metatdenovo_schema.json

ok -- validation done

methylseq_schema.json

Schema validation errors were encountered.
  methylseq_schema.json::$.definitions.alignment_options.description: '' is too short
  methylseq_schema.json::$.definitions.bwa_meth_options.description: '' is too short
  methylseq_schema.json::$.definitions.qualimap_options.properties.bamqc_regions_file: 'type' is a required property
  methylseq_schema.json::$.definitions.skip_pipeline_steps.description: '' is too short

mhcquant_schema.json

Schema validation errors were encountered.
  mhcquant_schema.json::$.definitions.database_options.description: '' is too short
  mhcquant_schema.json::$.definitions.preprocessing.description: '' is too short
  mhcquant_schema.json::$.definitions.mass_spectrometry_data_processing.description: '' is too short
  mhcquant_schema.json::$.definitions.rescoring.description: '' is too short
  mhcquant_schema.json::$.definitions.quantification_options.description: '' is too short
  mhcquant_schema.json::$.definitions.mhc_affinity_prediction.description: '' is too short
  mhcquant_schema.json::$.definitions.variant_options.description: '' is too short

mnaseseq_schema.json

Several files failed to parse.
  Failed to parse mnaseseq_schema.json

molkart_schema.json

ok -- validation done

multiplesequencealign_schema.json

ok -- validation done

nanoseq_schema.json

Schema validation errors were encountered.
  nanoseq_schema.json::$.definitions.input_output_options.properties.protocol.format: 'sample-type' is not one of ['file-path', 'directory-path', 'path']

nanostring_schema.json

ok -- validation done

nascent_schema.json

Schema validation errors were encountered.
  nascent_schema.json::$.definitions.alignment_options.description: '' is too short

neutronstar_schema.json

Several files failed to parse.
  Failed to parse neutronstar_schema.json

pangenome_schema.json

Schema validation errors were encountered.
  pangenome_schema.json::$.definitions.seqwish_options.description: '' is too short

pathogensurveillance_schema.json

ok -- validation done

pgdb_schema.json

Schema validation errors were encountered.
  pgdb_schema.json::$.definitions.gnomad_variant_proteomes.description: '' is too short

phageannotator_schema.json

ok -- validation done

phyloplace_schema.json

ok -- validation done

pixelator_schema.json

Schema validation errors were encountered.
  pixelator_schema.json::$.definitions.adapterqc_options: 'type' is a required property
  pixelator_schema.json::$.definitions.adapterqc_options.properties.adapterqc_mismatches.maximum: 0.9 is not of type 'integer'
  pixelator_schema.json::$.definitions.demux_options: 'type' is a required property
  pixelator_schema.json::$.definitions.demux_options.properties.demux_mismatches.maximum: 0.9 is not of type 'integer'
  pixelator_schema.json::$.definitions.collapse_options: 'type' is a required property
  pixelator_schema.json::$.definitions.graph_options: 'type' is a required property
  pixelator_schema.json::$.definitions.annotate_options: 'type' is a required property
  pixelator_schema.json::$.definitions.analysis_options: 'type' is a required property
  pixelator_schema.json::$.definitions.report_options: 'type' is a required property
  pixelator_schema.json::$.definitions.global_config_options.properties.pixelator_container.format: 'url' is not one of ['file-path', 'directory-path', 'path']

proteinfold_schema.json

Schema validation errors were encountered.
  proteinfold_schema.json::$.definitions.colabfold_dbs_and_parameters_path_options.properties.colabfold_alphafold2_params_tags.type: 'object' is not one of ['string', 'boolean', 'integer', 'number']

proteomicslfq_schema.json

Schema validation errors were encountered.
  proteomicslfq_schema.json::$.definitions.database_search.description: '' is too short
  proteomicslfq_schema.json::$.definitions.peptide_re_indexing.description: '' is too short
  proteomicslfq_schema.json::$.definitions.consensus_id.description: '' is too short
  proteomicslfq_schema.json::$.definitions.protein_quantification.description: '' is too short
  proteomicslfq_schema.json::$.definitions.statistical_post_processing.description: '' is too short
  proteomicslfq_schema.json::$.definitions.quality_control.description: '' is too short

quantms_schema.json

Schema validation errors were encountered.
  quantms_schema.json::$.definitions.database_search.description: '' is too short
  quantms_schema.json::$.definitions.peptide_re_indexing.description: '' is too short
  quantms_schema.json::$.definitions.consensus_id.description: '' is too short
  quantms_schema.json::$.definitions.protein_quantification_lfq.description: '' is too short
  quantms_schema.json::$.definitions.statistical_post_processing.description: '' is too short
  quantms_schema.json::$.definitions.quality_control.description: '' is too short

radseq_schema.json

ok -- validation done

raredisease_schema.json

ok -- validation done

readsimulator_schema.json

ok -- validation done

riboseq_schema.json

Schema validation errors were encountered.
  riboseq_schema.json::$.definitions.reference_genome_options.properties.riboviz_index.format: 'dir-path' is not one of ['file-path', 'directory-path', 'path']
  riboseq_schema.json::$.definitions.Trimming_options.properties.adapter_sequence: 'type' is a required property

rnadnavar_schema.json

ok -- validation done

rnafusion_schema.json

ok -- validation done

rnaseq_schema.json

ok -- validation done

rnasplice_schema.json

Schema validation errors were encountered.
  rnasplice_schema.json::$.definitions.dexseq_deu_options.description: '' is too short
  rnasplice_schema.json::$.definitions.edger_deu_options.description: '' is too short
  rnasplice_schema.json::$.definitions.dexseq_dtu_options.description: '' is too short
  rnasplice_schema.json::$.definitions.miso.description: '' is too short
  rnasplice_schema.json::$.definitions.suppa_options.description: '' is too short

rnavar_schema.json

Schema validation errors were encountered.
  rnavar_schema.json::$.definitions.preprocessing.description: '' is too short
  rnavar_schema.json::$.definitions.variant_calling.description: '' is too short
  rnavar_schema.json::$.definitions.variant_annotation.description: '' is too short

sammyseq_schema.json

ok -- validation done

sarek_schema.json

Schema validation errors were encountered.
  sarek_schema.json::$.definitions.annotation.description: '' is too short

scflow_schema.json

ok -- validation done

scrnaseq_schema.json

Schema validation errors were encountered.
  scrnaseq_schema.json::$.definitions.mandatory_arguments.description: '' is too short
  scrnaseq_schema.json::$.definitions.alevin_options.description: '' is too short
  scrnaseq_schema.json::$.definitions.starsolo_options.description: '' is too short

slamseq_schema.json

Several files failed to parse.
  Failed to parse slamseq_schema.json

smartseq2_schema.json

Several files failed to parse.
  Failed to parse smartseq2_schema.json

smrnaseq_schema.json

ok -- validation done

spatialtranscriptomics_schema.json

ok -- validation done

spinningjenny_schema.json

ok -- validation done

ssds_schema.json

ok -- validation done

taxprofiler_schema.json

Schema validation errors were encountered.
  taxprofiler_schema.json::$.definitions.profiling_options.description: '' is too short
  taxprofiler_schema.json::$.definitions.postprocessing_and_visualisation_options.description: '' is too short

variantcatalogue_schema.json

ok -- validation done

vipr_schema.json

Several files failed to parse.
  Failed to parse vipr_schema.json

viralintegration_schema.json

ok -- validation done

viralrecon_schema.json

ok -- validation done

@ewels
Copy link
Member Author

ewels commented Nov 17, 2023

Almost all current validation errors are due to empty description strings...

Co-authored-by: Nicolas Vannieuwkerke <[email protected]>
@ewels
Copy link
Member Author

ewels commented Nov 18, 2023

I think this is going to be quite useful! If allowing empty description strings, these are the errors that we're left with - basically all look like unintentional errors in schema:

airrflow

Schema validation errors were encountered.
  airrflow_schema.json::$.definitions.clonal_analysis_options.properties.clonal_threshold.type: ['string', 'number'] is not of type 'string'
  airrflow_schema.json::$.definitions.clonal_analysis_options.properties.clonal_threshold.type: ['string', 'number'] is not one of ['string', 'boolean', 'integer', 'number']

Yup, can't have an array for type, this is invalid:

"properties": {
    "clonal_threshold": {
        "type": ["string", "number"],

mag

Schema validation errors were encountered.
  mag_schema.json::$.definitions.taxonomic_profiling_options.properties.gtdbtk_min_completeness.minimum: 0.01 is not of type 'integer'

metaboigniter

Schema validation errors were encountered.
  metaboigniter_schema.json::$.definitions.internal_library_quantification_and_identification_parameters_negative_mode.properties.ipo_integrate_library_neg.help_text: '' is too short

methylseq

Schema validation errors were encountered.
  methylseq_schema.json::$.definitions.qualimap_options.properties.bamqc_regions_file: 'type' is a required property

nanoseq

Schema validation errors were encountered.
  nanoseq_schema.json::$.definitions.input_output_options.properties.protocol.format: 'sample-type' is not one of ['file-path', 'directory-path', 'path']

pixelator

Schema validation errors were encountered.
  pixelator_schema.json::$.definitions.adapterqc_options: 'type' is a required property
  pixelator_schema.json::$.definitions.adapterqc_options.properties.adapterqc_mismatches.maximum: 0.9 is not of type 'integer'
  pixelator_schema.json::$.definitions.demux_options: 'type' is a required property
  pixelator_schema.json::$.definitions.demux_options.properties.demux_mismatches.maximum: 0.9 is not of type 'integer'
  pixelator_schema.json::$.definitions.collapse_options: 'type' is a required property
  pixelator_schema.json::$.definitions.graph_options: 'type' is a required property
  pixelator_schema.json::$.definitions.annotate_options: 'type' is a required property
  pixelator_schema.json::$.definitions.analysis_options: 'type' is a required property
  pixelator_schema.json::$.definitions.report_options: 'type' is a required property
  pixelator_schema.json::$.definitions.global_config_options.properties.pixelator_container.format: 'url' is not one of ['file-path', 'directory-path', 'path']

proteinfold

Schema validation errors were encountered.
  proteinfold_schema.json::$.definitions.colabfold_dbs_and_parameters_path_options.properties.colabfold_alphafold2_params_tags.type: 'object' is not one of ['string', 'boolean', 'integer', 'number']

This is valid JSON-schema, but not officially supported by the Nextflow schema stack:

"colabfold_alphafold2_params_tags": {
    "type": "object",
    "description": "Dictionary with Alphafold2 parameters tags",

riboseq

Schema validation errors were encountered.
  riboseq_schema.json::$.definitions.reference_genome_options.properties.riboviz_index.format: 'dir-path' is not one of ['file-path', 'directory-path', 'path']
  riboseq_schema.json::$.definitions.Trimming_options.properties.adapter_sequence: 'type' is a required property

@ewels ewels marked this pull request as ready for review November 20, 2023 09:48
@ewels
Copy link
Member Author

ewels commented Nov 20, 2023

I've removed the zero-length description thing, as this schema is intended to validate that Nextflow schema files are valid on a functional level. We should build best-practice linting for this kind of thing into nf-core/tools separately.

It may be worth removing the > zero-length string requirement for help_text too 🤔 Though that one isn't triggering any errors for nf-core pipelines (presumably due to difference in underlying schema-builder tool stack) so it's not so important.

},
"fa_icon": {
"type": "string",
"pattern": "^fa"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding a description with a link to fontaweseome?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is a Nextflow schema thing, not a JSON schema thing (it's a non-standard field).

We already document it here: https://nextflow-io.github.io/nf-validation/nextflow_schema/nextflow_schema_specification/#fa_icon

Copy link
Collaborator

@mirpedrol mirpedrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ewels ewels merged commit bc51bc5 into master Nov 20, 2023
6 checks passed
@ewels ewels deleted the meta-schema branch November 20, 2023 10:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants