# Chapter 5 Peaks selection

After we get corrected peaks across samples, the next step is finding the differences between two groups. Actually, you could perform ANOVA or Kruskal-Wallis Test for comparison among more than two groups. The basic idea behind statistic analysis is to find the meaningful differences between groups and extract such ions or peak groups.

So how to find the differences? In most metabolomics software, such task is completed by a t-test and report p-value and fold changes. If you only compare two groups on one peaks, that’s OK. However, if you compare two groups on thousands of peaks, statistic textbook would tell you to notice the false positive. For one comparasion, the confidence level is 0.05, which means 5% chances to get false positive result. For two comparasions, such chances would be $$1-0.95^2$$. For 10 comparasions, such chances would be $$1-0.95^{10} = 0.4012631$$. For 100 comparasions, such chances would be $$1-0.95^{100} = 0.9940795$$. You would almost certainly to make mistakes for your results.

In statistics, the false discovery rate(FDR) control is always mentioned in omics studies for mutiple tests. I suggested using q-values to control FDR. If q-value is less than 0.05, we should expect a lower than 5% chances we make the wrong selections for all of the comparisions showed lower q-values in the whole dataset. Also we could use local false discovery rate, which showed the FDR for certain peaks. However, such values are hard to be estimated accurately.

Karin Ortmayr thought fold change might be better than p-values to find the differences(Ortmayr et al. 2016).

## 5.1 Peak misidentification

• Isomer

Use seperation methods such as chromatography, ion mobility MS, MS/MS. Reversed-phase ion-pairing chromatography and HILIC is useful and chemical derivatization is another options.

• Interfering compounds

20ppm is the least resolution and accuracy

## 5.2 Software

• IPO A Tool for automated Optimization of XCMS Parameters.

• xcms LC/MS and GC/MS Data Analysis

• Warpgroup is used for chromatogram subregion detection, consensus integration bound determination and accurate missing value integration

• Paired Mass Distance(PMD) analysis for GC/LC-MS based nontarget analysis.

• FTMSVisualization is a suite of tools for visualizing complex mixture FT-MS data

### References

Ortmayr, Karin, Verena Charwat, Cornelia Kasper, Stephan Hann, and Gunda Koellensperger. 2016. “Uncertainty Budgeting in Fold Change Determination and Implications for Non-Targeted Metabolomics Studies in Model Systems” 142 (1): 80–90. doi:10.1039/C6AN01342B.