Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardizing controlled vocabulary mapping vs allowing flexibility for unkown tags and sites changing mapping #8

Open
bradfordcondon opened this issue Nov 26, 2018 · 3 comments

Comments

@bradfordcondon
Copy link
Contributor

to discuss further with @mpoelchau and @childers

Problem:

NCBI doesnt provide ontology mappings for attributes.Monica has done lots of work going through all the attributes we are interested in. Now we need to assign them to terms. Our broad options are create an ncbi custom ontology or map terms to existing ontologies. I'm always a fan of using existing terms if possible, as that's tripal's approach.... although maybe since we're talking about NCBI we should be communicating with them.

Assuming we go ahead mapping terms, we then have to conisder how this module will associate the xml attributes with cvterms for properties.

Possible implementation: tag terms as associated with ncbi xml tag?

We could use cvtermprop, or just a custom table, to associate xml tags with cvterms. we then let users update that themselves and/or provide an interface to do so.

@bradfordcondon
Copy link
Contributor Author

Resources that already exist for metadata/ontology mapping for NCBI:

CEDAR ? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977712/

@bradfordcondon
Copy link
Contributor Author

bradfordcondon commented Nov 26, 2018

I drafted this message:

we are building a tool for importing metadata from NCBI and storing it in the community standardized database schema Chado. Doing so requires us to map each attribute to ontology terms: we therefore will be mapping the XML attributes available on the eutils API to ontology terms.

Rather than do these mappings ourselves in isolation, we want to work with the NCBI, perhaps even as part of a broader initiative to set internal metadata standards.

we're broadly interested across all the ncbi databases, but for now focusing on:

Assembly, biosample, bioproject

What we are wondering is, for these metadata tags:

Are they standardized?
Are they already mapped to ontologies?

If so, are these mappings publicly available?

If the tags aren’t standardized or mapped to ontologies, can we work together and with the broader community to do so?

Take for example the n50 tag: the OBI has a set of terms describing different n50 types.

The ncbi assembly defines this tag: contign50. Does it, or could it, map to the OBI contig N50 term? https://www.ebi.ac.uk/ols/ontologies/obi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0001941

I think an absolute ideal outcome would be for NCBI to produce and make available tis own ontology with terms such as this one included from other ontologies, so that each attribute downloaded from NCBI could be linked to an existing ontology term found on the EBI ontology lookup service.

@bradfordcondon
Copy link
Contributor Author

We really have two cases. The first are the more stable XML types. For example, <Organization>. For these, cvterm mappings are generally already taken care of in terms of how they are stored in chado. an organization becomes a chado contact, with a type, which has a term, etc.

The second are the attributes, for example the <Attributes><Attribute type=tissue> leaf</Attribute> tag in biosamples. Each attribute is then composed of a term that needs to be mapped because it wants to go into props. It is these attributes that really need to be robust and flexible.

storage options:

  • cvtermprop. the type would be local:euitils_attribute, and the value the XML attribute . The main drawback of this option is that if the attributes need different attributes for different ncbi databases, it isnt flexible enough.

  • custom table. columns would be tag name, cvterm_id its mapped to, and database if necessary to specify across. However, we would require a dedicated UI to manage.

  • Hardcoded array of mappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant