Add nativeID output in mzIdentML/pepXML[/mzML/PIN] #324

chambm · 2024-07-11T16:57:52Z

To preserve Waters and Sciex source spectrum links, writing nativeID in the output pepXML/mzIdentML is necessary. Please read in the nativeID when reading spectra and pass it through when writing pepXML/mzIdentML elements for those spectra. It would be good to preserve it in mzML as well, but that's not as important. If the format allows, it might be helpful to write it in Percolator PIN format as well so it's simple to map PIN lines to the mzIdentML/pepXML equivalent.

Thanks!

fcyu · 2024-08-17T19:51:43Z

Hi Matt,

Sorry for the long delay. I finally got a chance to implement this feature. If there is not too much trouble, could you share some typical Waters and Sciex mzML files with me to test?

If the format allows, it might be helpful to write it in Percolator PIN format as well so it's simple to map PIN lines to the mzIdentML/pepXML equivalent.

I don't want to change the SpecId column because many downstream tools parse that columns. As far as I know, there is no additional column can be used for the native ID. Let me know if the latest Percolator support the native ID column.

Also, may I ask if there is any harm to make the "index" not starting from 0 and not continuous? I would like to use scan num - 1 as the index to make it consistent when the mzML file is just a subset of the scans.

Thanks,

Fengchao

chambm · 2024-08-20T19:07:25Z

I'm glad to hear this is almost done!

Unfortunately AFAIK the mzML index must be 0-based and contiguous:
https://peptideatlas.org/tmp/mzML1.1.0.html#spectrum
Usually if you make an mzML from a format where you don't have a nativeID, only a scan number, you would just make the nativeID like "scan=123" or "index=122". But you already have a real nativeID. The problem here is to map the mzML/pepXML to the PIN TSV, right?
https://github.com/percolator/percolator/wiki/Interface#pintsv-tab-delimited-file-format

As far as I can tell from that, there should be a string PsmId column and a numeric ScanNr column. It seems pretty typical for the ScanNr column to be missing though. I can understand not wanting to change the PsmId format you've been using, but that's really the only column suitable for the nativeID. :(

Maybe easiest would just be to guarantee that the number and order of lines in the pepXML is the same in the PIN?

chambm · 2024-08-20T19:14:41Z

Here's an example Waters DDA file.
010208_ecoli_003-dda2.zip

fcyu · 2024-08-20T19:21:21Z

Mapping the mzML/pepXML to the pin file is actually OK as long as we have a consistent way to extract the scan number (from native ID if it is encoded in 1-D such as Thermo's, or index + 1 if it is not in 1-D such as Waters' and Sciex's). I asked because you want it. If it is OK not having the native ID in the pin file, I guess I can ignore it.

The problem is that if there is a mzML file that is a subset of the original mzML file, and its native ID does not encode the scan number in 1-D, like what Waters and Sciex have. Then, since the scan number = index + 1, the scan numbers in the subset mzML are different from those in the original mzML, and it is hard to map across different tools. The way I think are not generating the sub mzML file or make the index = scan number - 1 (which will not start with 0 and not contiguous)

Maybe in the future, the mzML schema can have a scan_number field for the tools to put their own-defined scan numbers. Then, still need those tools to support it.....

Best,

Fengchao

chambm · 2024-08-20T19:47:14Z

You could use a userParam. Those are arbitrary and basically unlimited.

But I think nativeID is specifically intended and useful for mapping across different tools, and for remaining valid when files are filtered or subsetted. It's why I started putting spectrumNativeID in my pepXML output, even though that wasn't an official attribute. :)

fcyu · 2024-08-20T19:51:08Z

But I think nativeID is specifically intended and useful for mapping across different tools, and for remaining valid when files are filtered or subsetted. It's why I started putting spectrumNativeID in my pepXML output, even though that wasn't an official attribute. :)

Yes, but then, I need to maintain native ID -> scan number and scan number -> native ID maps in all the tools that read mzML and raw files because we index scans using 1-D.

Best,

Fengchao

chambm · 2024-08-20T19:59:42Z

In my tools I made nativeID a field in the Spectrum class and had a map from nativeID to Spectrum*. When possible, I dropped scan number entirely because it wasn't universally applicable, and when not possible, I parsed it out of the nativeID (or used index if not parseable).

chambm · 2024-09-24T14:34:39Z

Hi Fengchao, has this change made it into a released MSFragger?

fcyu · 2024-09-24T14:44:14Z

For Thermo data, the spectrumNativeID is already in the pepXML file. For the others that require changing the scan indexing, I am trying to get it done before the next release.

Best,

Fengchao

fcyu self-assigned this Jul 11, 2024

fcyu mentioned this issue Aug 17, 2024

4.1 writing _uncalibrated.mzML for RAW files #327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nativeID output in mzIdentML/pepXML[/mzML/PIN] #324

Add nativeID output in mzIdentML/pepXML[/mzML/PIN] #324

chambm commented Jul 11, 2024

fcyu commented Aug 17, 2024 •

edited

Loading

chambm commented Aug 20, 2024

chambm commented Aug 20, 2024

fcyu commented Aug 20, 2024 •

edited

Loading

chambm commented Aug 20, 2024

fcyu commented Aug 20, 2024

chambm commented Aug 20, 2024

chambm commented Sep 24, 2024

fcyu commented Sep 24, 2024

Add nativeID output in mzIdentML/pepXML[/mzML/PIN] #324

Add nativeID output in mzIdentML/pepXML[/mzML/PIN] #324

Comments

chambm commented Jul 11, 2024

fcyu commented Aug 17, 2024 • edited Loading

chambm commented Aug 20, 2024

chambm commented Aug 20, 2024

fcyu commented Aug 20, 2024 • edited Loading

chambm commented Aug 20, 2024

fcyu commented Aug 20, 2024

chambm commented Aug 20, 2024

chambm commented Sep 24, 2024

fcyu commented Sep 24, 2024

fcyu commented Aug 17, 2024 •

edited

Loading

fcyu commented Aug 20, 2024 •

edited

Loading