Chapter 6 Annotation

When you get the peaks table or features table, annotation of the peaks would help you. Check this review(Domingo-Almenara et al. 2018) for a detailed notes on annotation. They proposed five levels regarding currently computational annotation strategies.

  • Level 1: Peak Grouping: MS Psedospectra extraction based on peak shape similarity and peak abundance correlation

  • Level 2: Peak Annotation: Adducts, Neutral losses, isotopes, and other mass relationships based on mass distances

  • Level 3: Biochemical knowledge based on putative identification, potential biochemical reaction and related statistical analysis

  • Level 4: Use and intergration of tandem MS data based on data dependant/independent acquistion mode or in silico predction

  • Level 5: Retention time prediction based on library-available retention index or quantitative structure-retnetion relationships (QSRR) models.

Most of the softwares are at level 1 or 2. If we only have compounds structure, we could guess ions under different ionization method. If we have mass spectrum, we could match the mass spectral by a similarity analysis to the database. In metabolomics, we only have mass spectrum or mass-to-charge ratios. Single mass-to-charge ratio is not enough for identification. That’s the one bottleneck for annotation. So prediction is always performed on MS/MS data.

6.1 Issues in annotation

The major issue in annotation is the redundancy peaks from same metabolite. Unlike genomcis, peaks or featuers from peak selection are not independant with each other. Adducts, in-source fragments and isotopes would lead to missannotation. A commen solution is that use known adducts, neutral losses, molecular multimers or multipley charged ions to compare mass distances.

Another issue is about the MS/MS database. Only 10% of known metabolites in databases have experimental spectral data. Thus in silico prediction are required. Some works try to fill the gap between experimental data, theoretical values(from chemical database like chemspider) and prediction together. Here is a nice review about MS/MS prediction(Hufsky, Scheubert, and Böcker 2014).

6.2 Annotation v.s. identification

According to the defination from the Chemical Analysis Working Group of the Metabolomics Standards Intitvative(Sumner et al. 2007; Viant et al. 2017). Four levels of confidence could be assigned to identification:

  • Level 1 ‘identified metabolites’
  • Level 2 ‘Putatively annotated compounds’
  • Level 3 ‘Putatively characterised compound classes’
  • Level 4 ‘Unknown’

In practice, data analysis based annotation could reach level 2. For level 1, we need at extra methods such as MS/MS, retention time, accurate mass, 2D NMR spectra, and so on to confirm the compounds. However, standards are always required for solid proof.

6.3 MS Database for annotation

6.3.1 MS/MS

  • MoNA Platform to collect all other open source database

  • MassBank

  • GNPS use inner correlationship in the data and make network analysis at peaks’ level instand of annotated compounds to annotate the data(Wang et al. 2016).

  • ReSpect: phytochemicals

  • Metlin is another useful online application for annotation(Guijas et al. 2018).

  • LipidBlast: in silico prediction

  • MZcloud

  • NIST: Not free

6.3.2 MS

6.4 Compounds Database

  • PubChem is an open chemistry database at the National Institutes of Health (NIH).

  • Chemspider is a free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources.

  • ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.

  • RefMet A Reference list of Metabolite names.

6.5 Software

6.5.1 Adducts list

You could find adducts list here from commonMZ project.

6.5.2 Molgen

molgen generating all structures (connectivity isomers, constitutions) that correspond to a given molecular formula, with optional further restrictions, e.g. presence or absence of particular substructures.

6.5.3 Isotope

Isotope pattern prediction

6.5.4 mfFinder

mfFinder predict formula based on accurate mass

6.5.5 CAMERA

Common annotation for xcms workflow(Kuhl et al. 2012).

6.5.6 RAMClustR

The software could be found here(Broeckling et al. 2014). The package included a vignette as usages. Use the following code to read:

vignette('RAMClustR',package = 'RAMClustR')

6.5.7 nontarget

nontarget Isotope & adduct peak grouping, homologue series detection

6.5.8 xMSannotator

The software could be found here(Uppal, Walker, and Jones 2017).

6.5.9 MINE

MINE is an open access database of computationally predicted enzyme promiscuity products for untargeted metabolomics. The annotation would be accurate for general compounds database.

6.5.10 InterpretMSSpectrum

This package is for annotate and interpret deconvoluted mass spectra (mass*intensity pairs) from high resolution mass spectrometry devices. You could use this package to find molecular ions for GC-MS.

6.5.11 For Ident

For-ident could give a score for identification with the help of logD(relative retention time) and/or MS/MS.

6.5.12 mzmatch

Use the following code to install this package:

biocLite(c("xcms", "multtest", "mzR"))
install.packages(c("rJava", "XML", "snow", "caTools",
   "bitops", "ptw", "gplots", "tcltk2"))
source ("")

6.5.13 mz.unity

You could find source code here(Mahieu et al. 2016) and it’s for detecting and exploring complex relationships in accurate-mass mass spectrometry data.

6.5.14 MAIT

You could find source code here(Fernández-Albert et al. 2014).

6.5.15 ProbMetab

Provides probability ranking to candidate compounds assigned to masses, with the prior assumption of connected sample and additional previous and spectral information modeled by the user. You could find source code here(Silva et al. 2014).

6.5.16 RAMSI

You could find paper here(Baran and Northen 2013).

6.5.17 Sirius

Sirius is a new java-based software framework(Dührkop et al. 2015) for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry. It could be used with CSI:FingerID.

6.5.18 MI-Pack

You could find python software here(Weber and Viant 2010)

6.5.19 Plantmat

excel library based pridiction for plant metabolites(Qiu et al. 2016).

6.5.20 MetFamily

Shiny app for MS and MS/MS data annotation(Treutler et al. 2016).

6.5.21 Lipidmatch

in silico: in silico lipid mass spectrum search(Koelmel et al. 2017).

6.5.22 MolFind

JAVA based MolFind could make annotation for unknown chemical structure by prediction based on RI, ECOM50, drift time and CID spectra(Menikarachchi et al. 2012).

6.5.23 MetFusion

Java based integration of compound identification strategies. You could access the application here(Gerlich and Neumann 2013).

6.5.24 iMet

This online application is a network-based computation method for annotation(Aguilar-Mogas et al. 2017).

6.5.25 Metscape

Metscape based on Debiased Sparse Partial Correlation (DSPC) algorithm(Basu et al. 2017) to make annotation.

6.5.26 MetFrag

MetFrag could be used to make in silico prediction/match of MS/MS data(Ruttkies et al. 2016).

6.5.27 LipidFrag

LipidFrag could be used to make in silico prediction/match of lipid related MS/MS data(Witting et al. 2017).

6.5.28 MycompoundID

MycompoundID could be used to search known and unknown metabolites(Li et al. 2013) online.

6.5.29 CFM-ID

CFM-ID use Metlin’s data to make prediction(Allen et al. 2014).

6.5.30 magma

magma could predict and match MS/MS files.


Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, H. Paul Benton, and Gary Siuzdak. 2018. “Annotation: A Computational Solution for Streamlining Metabolomics Analysis.” Anal. Chem. 90 (1): 480–89. doi:10.1021/acs.analchem.7b03929.

Hufsky, Franziska, Kerstin Scheubert, and Sebastian Böcker. 2014. “Computational Mass Spectrometry for Small-Molecule Fragmentation.” TrAC Trends in Analytical Chemistry 53 (January): 41–48. doi:10.1016/j.trac.2013.09.008.

Sumner, Lloyd W., Alexander Amberg, Dave Barrett, Michael H. Beale, Richard Beger, Clare A. Daykin, Teresa W.-M. Fan, et al. 2007. “Proposed Minimum Reporting Standards for Chemical Analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI).” Metabolomics 3 (3): 211–21. doi:10.1007/s11306-007-0082-2.

Viant, Mark R, Irwin J Kurland, Martin R Jones, and Warwick B Dunn. 2017. “How Close Are We to Complete Annotation of Metabolomes?” Current Opinion in Chemical Biology, Omics, 36 (February): 64–69. doi:10.1016/j.cbpa.2017.01.001.

Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nat. Biotechnol. 34 (8): 828–37. doi:10.1038/nbt.3597.

Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, Amelia Palermo, Benedikt Warth, Gerrit Hermann, Gunda Koellensperger, et al. 2018. “METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Anal. Chem. 90 (5): 3156–64. doi:10.1021/acs.analchem.7b04424.

Kuhl, Carsten, Ralf Tautenhahn, Christoph Böttcher, Tony R. Larson, and Steffen Neumann. 2012. “CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets.” Anal. Chem. 84 (1): 283–89. doi:10.1021/ac202450g.

Broeckling, C. D., F. A. Afsar, S. Neumann, A. Ben-Hur, and J. E. Prenni. 2014. “RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.” Anal. Chem. 86 (14): 6812–7. doi:10.1021/ac501530d.

Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. “XMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Anal. Chem. 89 (2): 1063–7. doi:10.1021/acs.analchem.6b01214.

Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.Unity Algorithm.” Anal. Chem. 88 (18): 9037–46. doi:10.1021/acs.analchem.6b01702.

Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–9. doi:10.1093/bioinformatics/btu136.

Silva, Ricardo R., Fabien Jourdan, Diego M. Salvanha, Fabien Letisse, Emilien L. Jamin, Simone Guidetti-Gonzalez, Carlos A. Labate, and Ricardo Z. N. Vêncio. 2014. “ProbMetab: An R Package for Bayesian Probabilistic Annotation of LCMS-Based Metabolomics.” Bioinformatics 30 (9): 1336–7. doi:10.1093/bioinformatics/btu019.

Baran, Richard, and Trent R. Northen. 2013. “Robust Automated Mass Spectra Interpretation and Chemical Formula Calculation Using Mixed Integer Linear Programming.” Anal. Chem. 85 (20): 9777–84. doi:10.1021/ac402180c.

Dührkop, Kai, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. 2015. “Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID.” PNAS 112 (41): 12580–5. doi:10.1073/pnas.1509788112.

Weber, Ralf J. M., and Mark R. Viant. 2010. “MI-Pack: Increased Confidence of Metabolite Identification in Mass Spectra by Integrating Accurate Masses and Metabolic Pathways.” Chemometrics and Intelligent Laboratory Systems, OMICS, 104 (1): 75–82. doi:10.1016/j.chemolab.2010.04.010.

Qiu, Feng, Dennis D. Fine, Daniel J. Wherritt, Zhentian Lei, and Lloyd W. Sumner. 2016. “PlantMAT: A Metabolomics Tool for Predicting the Specialized Metabolic Potential of a System and for Large-Scale Metabolite Identifications.” Anal. Chem. 88 (23): 11373–83. doi:10.1021/acs.analchem.6b00906.

Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, Karin Gorzolka, Alain Tissier, Steffen Neumann, and Gerd Ulrich Balcke. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Anal. Chem. 88 (16): 8082–90. doi:10.1021/acs.analchem.6b01569.

Koelmel, Jeremy P., Nicholas M. Kroeger, Candice Z. Ulmer, John A. Bowden, Rainey E. Patterson, Jason A. Cochran, Christopher W. W. Beecher, Timothy J. Garrett, and Richard A. Yost. 2017. “LipidMatch: An Automated Workflow for Rule-Based Lipid Identification Using Untargeted High-Resolution Tandem Mass Spectrometry Data.” BMC Bioinformatics 18 (July): 331. doi:10.1186/s12859-017-1744-3.

Menikarachchi, Lochana C., Shannon Cawley, Dennis W. Hill, L. Mark Hall, Lowell Hall, Steven Lai, Janine Wilder, and David F. Grant. 2012. “MolFind: A Software Package Enabling HPLC/MS-Based Identification of Unknown Chemical Structures.” Anal. Chem. 84 (21): 9388–94. doi:10.1021/ac302048x.

Gerlich, Michael, and Steffen Neumann. 2013. “MetFusion: Integration of Compound Identification Strategies.” J. Mass Spectrom. 48 (3): 291–98. doi:10.1002/jms.3123.

Aguilar-Mogas, Antoni, Marta Sales-Pardo, Miriam Navarro, Roger Guimerà, and Oscar Yanes. 2017. “IMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra.” Anal. Chem. 89 (6): 3474–82. doi:10.1021/acs.analchem.6b04512.

Basu, Sumanta, William Duren, Charles R. Evans, Charles F. Burant, George Michailidis, and Alla Karnovsky. 2017. “Sparse Network Modeling and Metscape-Based Visualization Methods for the Analysis of Large-Scale Metabolomics Data.” Bioinformatics 33 (10): 1545–53. doi:10.1093/bioinformatics/btx012.

Ruttkies, Christoph, Emma L. Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. 2016. “MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation.” Journal of Cheminformatics 8 (January): 3. doi:10.1186/s13321-016-0115-9.

Witting, Michael, Christoph Ruttkies, Steffen Neumann, and Philippe Schmitt-Kopplin. 2017. “LipidFrag: Improving Reliability of in Silico Fragmentation of Lipids and Application to the Caenorhabditis Elegans Lipidome.” PLOS ONE 12 (3): e0172311. doi:10.1371/journal.pone.0172311.

Li, Liang, Ronghong Li, Jianjun Zhou, Azeret Zuniga, Avalyn E. Stanislaus, Yiman Wu, Tao Huan, et al. 2013. “MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification.” Anal. Chem. 85 (6): 3401–8. doi:10.1021/ac400099b.

Allen, Felicity, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. 2014. “CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra.” Nucleic Acids Res 42 (W1): W94–W99. doi:10.1093/nar/gku436.