Chapter 6 Annotation
When you get the peaks table or features table, annotation of the peaks would help you. Check this review(Domingo-Almenara et al. 2018) for a detailed notes on annotation. They proposed five levels regarding currently computational annotation strategies.
Level 1: Peak Grouping: MS Psedospectra extraction based on peak shape similarity and peak abundance correlation
Level 2: Peak Annotation: Adducts, Neutral losses, isotopes, and other mass relationships based on mass distances
Level 3: Biochemical knowledge based on putative identification, potential biochemical reaction and related statistical analysis
Level 4: Use and intergration of tandem MS data based on data dependant/independent acquistion mode or in silico predction
Level 5: Retention time prediction based on library-available retention index or quantitative structure-retnetion relationships (QSRR) models.
Most of the softwares are at level 1 or 2. If we only have compounds structure, we could guess ions under different ionization method. If we have mass spectrum, we could match the mass spectral by a similarity analysis to the database. In metabolomics, we only have mass spectrum or mass-to-charge ratios. Single mass-to-charge ratio is not enough for identification. That’s the one bottleneck for annotation. So prediction is always performed on MS/MS data.
6.1 Issues in annotation
The major issue in annotation is the redundancy peaks from same metabolite. Unlike genomcis, peaks or featuers from peak selection are not independant with each other. Adducts, in-source fragments and isotopes would lead to missannotation. A commen solution is that use known adducts, neutral losses, molecular multimers or multipley charged ions to compare mass distances.
Another issue is about the MS/MS database. Only 10% of known metabolites in databases have experimental spectral data. Thus in silico prediction are required. Some works try to fill the gap between experimental data, theoretical values(from chemical database like chemspider) and prediction together. Here is a nice review about MS/MS prediction(Hufsky, Scheubert, and Böcker 2014).
6.2 Annotation v.s. identification
According to the defination from the Chemical Analysis Working Group of the Metabolomics Standards Intitvative(Sumner et al. 2007; Viant et al. 2017). Four levels of confidence could be assigned to identification:
- Level 1 ‘identified metabolites’
- Level 2 ‘Putatively annotated compounds’
- Level 3 ‘Putatively characterised compound classes’
- Level 4 ‘Unknown’
In practice, data analysis based annotation could reach level 2. For level 1, we need at extra methods such as MS/MS, retention time, accurate mass, 2D NMR spectra, and so on to confirm the compounds. However, standards are always required for solid proof.
6.3 MS Database for annotation
MoNA Platform to collect all other open source database
LipidBlast: in silico prediction
NIST: Not free
6.4 Compounds Database
PubChem is an open chemistry database at the National Institutes of Health (NIH).
Chemspider is a free chemical structure database providing fast text and structure search access to over 67 million structures from hundreds of data sources.
ChEBI is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
RefMet A Reference list of Metabolite names.
6.5.1 Adducts list
You could find adducts list here from commonMZ project.
molgen generating all structures (connectivity isomers, constitutions) that correspond to a given molecular formula, with optional further restrictions, e.g. presence or absence of particular substructures.
Isotope pattern prediction
mfFinder predict formula based on accurate mass
vignette('RAMClustR',package = 'RAMClustR')
nontarget Isotope & adduct peak grouping, homologue series detection
MINE is an open access database of computationally predicted enzyme promiscuity products for untargeted metabolomics. The annotation would be accurate for general compounds database.
This package is for annotate and interpret deconvoluted mass spectra (mass*intensity pairs) from high resolution mass spectrometry devices. You could use this package to find molecular ions for GC-MS.
6.5.11 For Ident
For-ident could give a score for identification with the help of logD(relative retention time) and/or MS/MS.
Use the following code to install this package:
source("http://bioconductor.org/biocLite.R") biocLite(c("xcms", "multtest", "mzR")) install.packages(c("rJava", "XML", "snow", "caTools", "bitops", "ptw", "gplots", "tcltk2")) source ("http://puma.ibls.gla.ac.uk/mzmatch.R/install_mzmatch.R")
Provides probability ranking to candidate compounds assigned to masses, with the prior assumption of connected sample and additional previous and spectral information modeled by the user. You could find source code here(Silva et al. 2014).
You could find paper here(Baran and Northen 2013).
Sirius is a new java-based software framework(Dührkop et al. 2015) for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry. It could be used with CSI:FingerID.
magma could predict and match MS/MS files.
Domingo-Almenara, Xavier, J. Rafael Montenegro-Burke, H. Paul Benton, and Gary Siuzdak. 2018. “Annotation: A Computational Solution for Streamlining Metabolomics Analysis.” Anal. Chem. 90 (1): 480–89. doi:10.1021/acs.analchem.7b03929.
Hufsky, Franziska, Kerstin Scheubert, and Sebastian Böcker. 2014. “Computational Mass Spectrometry for Small-Molecule Fragmentation.” TrAC Trends in Analytical Chemistry 53 (January): 41–48. doi:10.1016/j.trac.2013.09.008.
Sumner, Lloyd W., Alexander Amberg, Dave Barrett, Michael H. Beale, Richard Beger, Clare A. Daykin, Teresa W.-M. Fan, et al. 2007. “Proposed Minimum Reporting Standards for Chemical Analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI).” Metabolomics 3 (3): 211–21. doi:10.1007/s11306-007-0082-2.
Viant, Mark R, Irwin J Kurland, Martin R Jones, and Warwick B Dunn. 2017. “How Close Are We to Complete Annotation of Metabolomes?” Current Opinion in Chemical Biology, Omics, 36 (February): 64–69. doi:10.1016/j.cbpa.2017.01.001.
Wang, Mingxun, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, et al. 2016. “Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking.” Nat. Biotechnol. 34 (8): 828–37. doi:10.1038/nbt.3597.
Guijas, Carlos, J. Rafael Montenegro-Burke, Xavier Domingo-Almenara, Amelia Palermo, Benedikt Warth, Gerrit Hermann, Gunda Koellensperger, et al. 2018. “METLIN: A Technology Platform for Identifying Knowns and Unknowns.” Anal. Chem. 90 (5): 3156–64. doi:10.1021/acs.analchem.7b04424.
Kuhl, Carsten, Ralf Tautenhahn, Christoph Böttcher, Tony R. Larson, and Steffen Neumann. 2012. “CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets.” Anal. Chem. 84 (1): 283–89. doi:10.1021/ac202450g.
Broeckling, C. D., F. A. Afsar, S. Neumann, A. Ben-Hur, and J. E. Prenni. 2014. “RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data.” Anal. Chem. 86 (14): 6812–7. doi:10.1021/ac501530d.
Uppal, Karan, Douglas I. Walker, and Dean P. Jones. 2017. “XMSannotator: An R Package for Network-Based Annotation of High-Resolution Metabolomics Data.” Anal. Chem. 89 (2): 1063–7. doi:10.1021/acs.analchem.6b01214.
Mahieu, Nathaniel G., Jonathan L. Spalding, Susan J. Gelman, and Gary J. Patti. 2016. “Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.Unity Algorithm.” Anal. Chem. 88 (18): 9037–46. doi:10.1021/acs.analchem.6b01702.
Fernández-Albert, Francesc, Rafael Llorach, Cristina Andrés-Lacueva, and Alexandre Perera. 2014. “An R Package to Analyse LC/MS Metabolomic Data: MAIT (Metabolite Automatic Identification Toolkit).” Bioinformatics 30 (13): 1937–9. doi:10.1093/bioinformatics/btu136.
Silva, Ricardo R., Fabien Jourdan, Diego M. Salvanha, Fabien Letisse, Emilien L. Jamin, Simone Guidetti-Gonzalez, Carlos A. Labate, and Ricardo Z. N. Vêncio. 2014. “ProbMetab: An R Package for Bayesian Probabilistic Annotation of LCMS-Based Metabolomics.” Bioinformatics 30 (9): 1336–7. doi:10.1093/bioinformatics/btu019.
Baran, Richard, and Trent R. Northen. 2013. “Robust Automated Mass Spectra Interpretation and Chemical Formula Calculation Using Mixed Integer Linear Programming.” Anal. Chem. 85 (20): 9777–84. doi:10.1021/ac402180c.
Dührkop, Kai, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. 2015. “Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID.” PNAS 112 (41): 12580–5. doi:10.1073/pnas.1509788112.
Weber, Ralf J. M., and Mark R. Viant. 2010. “MI-Pack: Increased Confidence of Metabolite Identification in Mass Spectra by Integrating Accurate Masses and Metabolic Pathways.” Chemometrics and Intelligent Laboratory Systems, OMICS, 104 (1): 75–82. doi:10.1016/j.chemolab.2010.04.010.
Qiu, Feng, Dennis D. Fine, Daniel J. Wherritt, Zhentian Lei, and Lloyd W. Sumner. 2016. “PlantMAT: A Metabolomics Tool for Predicting the Specialized Metabolic Potential of a System and for Large-Scale Metabolite Identifications.” Anal. Chem. 88 (23): 11373–83. doi:10.1021/acs.analchem.6b00906.
Treutler, Hendrik, Hiroshi Tsugawa, Andrea Porzel, Karin Gorzolka, Alain Tissier, Steffen Neumann, and Gerd Ulrich Balcke. 2016. “Discovering Regulated Metabolite Families in Untargeted Metabolomics Studies.” Anal. Chem. 88 (16): 8082–90. doi:10.1021/acs.analchem.6b01569.
Koelmel, Jeremy P., Nicholas M. Kroeger, Candice Z. Ulmer, John A. Bowden, Rainey E. Patterson, Jason A. Cochran, Christopher W. W. Beecher, Timothy J. Garrett, and Richard A. Yost. 2017. “LipidMatch: An Automated Workflow for Rule-Based Lipid Identification Using Untargeted High-Resolution Tandem Mass Spectrometry Data.” BMC Bioinformatics 18 (July): 331. doi:10.1186/s12859-017-1744-3.
Menikarachchi, Lochana C., Shannon Cawley, Dennis W. Hill, L. Mark Hall, Lowell Hall, Steven Lai, Janine Wilder, and David F. Grant. 2012. “MolFind: A Software Package Enabling HPLC/MS-Based Identification of Unknown Chemical Structures.” Anal. Chem. 84 (21): 9388–94. doi:10.1021/ac302048x.
Gerlich, Michael, and Steffen Neumann. 2013. “MetFusion: Integration of Compound Identification Strategies.” J. Mass Spectrom. 48 (3): 291–98. doi:10.1002/jms.3123.
Aguilar-Mogas, Antoni, Marta Sales-Pardo, Miriam Navarro, Roger Guimerà, and Oscar Yanes. 2017. “IMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra.” Anal. Chem. 89 (6): 3474–82. doi:10.1021/acs.analchem.6b04512.
Basu, Sumanta, William Duren, Charles R. Evans, Charles F. Burant, George Michailidis, and Alla Karnovsky. 2017. “Sparse Network Modeling and Metscape-Based Visualization Methods for the Analysis of Large-Scale Metabolomics Data.” Bioinformatics 33 (10): 1545–53. doi:10.1093/bioinformatics/btx012.
Ruttkies, Christoph, Emma L. Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. 2016. “MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation.” Journal of Cheminformatics 8 (January): 3. doi:10.1186/s13321-016-0115-9.
Witting, Michael, Christoph Ruttkies, Steffen Neumann, and Philippe Schmitt-Kopplin. 2017. “LipidFrag: Improving Reliability of in Silico Fragmentation of Lipids and Application to the Caenorhabditis Elegans Lipidome.” PLOS ONE 12 (3): e0172311. doi:10.1371/journal.pone.0172311.
Li, Liang, Ronghong Li, Jianjun Zhou, Azeret Zuniga, Avalyn E. Stanislaus, Yiman Wu, Tao Huan, et al. 2013. “MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification.” Anal. Chem. 85 (6): 3401–8. doi:10.1021/ac400099b.
Allen, Felicity, Allison Pon, Michael Wilson, Russ Greiner, and David Wishart. 2014. “CFM-ID: A Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra.” Nucleic Acids Res 42 (W1): W94–W99. doi:10.1093/nar/gku436.