<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>blog on Miao Yu | 于淼</title>
    <link>https://yufree.cn/en/</link>
    <description>Recent content in blog on Miao Yu | 于淼</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 22 Feb 2017 00:00:00 +0000</lastBuildDate><atom:link href="https://yufree.cn/en/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Single File Injections: A &#39;Fishing-Style&#39; Paradigm Shift for High-Throughput LC-MS Analysis</title>
      <link>https://yufree.cn/en/2025/08/14/single-file-injections-a-fishing-style-paradigm-shift-for-high-throughput-lc-ms-analysis/</link>
      <pubDate>Thu, 14 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2025/08/14/single-file-injections-a-fishing-style-paradigm-shift-for-high-throughput-lc-ms-analysis/</guid>
      <description>&lt;p&gt;In non-targeted metabolomics or environmental non-targeted analysis, throughput is a key factor involving data quality. Typically, we use the same chromatography-mass spectrometry method to run a sample sequence. The sequence executes sample analysis tasks sequentially, meaning the next sample is injected only after the previous one has completely finished running, with each sample generating a single data file. In this process, even if we compress the chromatographic separation time to 15 minutes, we can run fewer than one hundred samples a day at maximum capacity. When considering the quality control (QC) samples included in the injection sequence, the actual number of samples that can be analyzed is even smaller. If thousands of samples need to be processed, the analysis can easily stretch over a month. An instrument running continuously for a year without maintenance can only handle less than ten thousand actual samples, not to mention that each sample might need to be run on different chromatography columns and in both positive and negative ion modes. Under this technical limitation, it is difficult for non-targeted metabolomics or environmental non-targeted analysis to match the throughput of other omics fields, which can handle thousands or tens of thousands of samples per day. Furthermore, as the number of samples increases, batch effects or instrument stability require additional correction, making data quality control exceedingly complex—a true bottleneck issue.&lt;/p&gt;
&lt;p&gt;The biggest limitation here is actually sequential injection, where each additional sample adds a full chromatographic separation time. From a data perspective, each sample corresponds to an independent data file, which actually wastes data storage space. For example, when you look at the mass spectrum of sample A eluting at 10 minutes, this spectrum only contains substances from sample A, and most of the space in a full scan is just noise. We can then consider increasing the amount of information by adding more samples while keeping the data space for each sample constant. In simple terms, we don&amp;rsquo;t wait for one sample to completely finish before injecting the next. Instead, we continuously inject different samples at a fixed time interval, for example, every 1 minute. This way, the data collected at the 10-minute mark of the run will contain substances from sample A separated for 10 minutes, substances from sample B separated for 9 minutes, substances from sample C separated for 8 minutes, and so on. The figure below is an example. The first 10 minutes show the complete separation of 10 substances. What follows is the data that appears when 9 identical samples are injected at one-minute intervals. To the left of the red dashed line is conventional injection, where the mass-to-charge ratio vs. retention time data space is sparse and not utilized efficiently. However, with fixed-interval injections as shown on the right, we can see that the data storage space utilization is significantly improved.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/cn/2025/08/12/sfi/images/demo.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;If the chromatographic column&amp;rsquo;s separation performance is reliable, then theoretically, isomers eluting at different retention times will only interfere with other samples if their time interval is exactly one minute. In this scenario, the marginal time required for each additional sample is theoretically only one minute. This allows us to achieve the analysis of thousands of samples per day. At the same time, the interference from the background or matrix is naturally shortened, mobile phase consumption is greatly reduced, and under isocratic elution, complex peak alignment steps are largely unnecessary. It could be said that in the world of techniques, speed conquers all.&lt;/p&gt;
&lt;p&gt;This idea did not come from nowhere. In fact, flow injection technology emerged in the 1970s to increase throughput. Later, with the development of chromatographic technology, methods involving fixed-time-interval injection with isocratic elution also appeared. However, these methods were not for analyzing unknown compounds but for known ones, such as in high-throughput drug screening. If you already know the retention time of a substance, you can find the peak position for the next sample by simply adding the retention time to the injection interval. But what if you don&amp;rsquo;t know what you are measuring? This brings us back to the previously mentioned scenario. I can inject samples this way, but how do I process the data? As shown in the figure below, the data we collect is like the superimposed image on the left, and we need to recover the individual images on the right.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/cn/2025/08/12/sfi/images/pic.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;I submitted an abstract on this topic to this year&amp;rsquo;s American Society for Mass Spectrometry (ASMS) conference, primarily to solve this problem, and recently submitted a preprint to make this technology public. The solution is quite simple. While we indeed don&amp;rsquo;t know what we are measuring, we don&amp;rsquo;t necessarily need to. The most crucial piece of information is that a specific mass-to-charge ratio will appear at a specific retention time. We can obtain this information by running a complete separation of a pooled sample before conducting the fixed-interval injections. Then, we just need to get all the combinations of retention times and mass-to-charge ratios that appear in the pooled sample. Subsequently, the information for a given substance in different samples can be obtained by looking for the response of that mass-to-charge ratio at the expected peak time, which is calculated as its original retention time plus the product of the fixed time interval and the sample injection number. This is like fishing: we don&amp;rsquo;t know where the fish will appear, but if we have baited a spot beforehand, we can find the corresponding fish by casting our line near that spot. Therefore, this strategy can be understood as &amp;ldquo;baiting the spot before fishing,&amp;rdquo; or &amp;ldquo;Fishing-style Injection.&amp;rdquo; Note that this method does not require finding a peak for the substance in every sample; it only provides a spatial range to look for the peak. If no peak is found, it means the corresponding sample does not contain that substance. The figure below is a demonstration of the algorithm: the retention time of peak A in the pooled sample, plus the fixed time interval multiplied by (injection number - 1), plus the full separation time of the QC sample, equals the retention time of peak A in the sample with that injection number.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/cn/2025/08/12/sfi/images/SFIsimpleAlgorithm.png&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
&lt;p&gt;An idea without action is cheap. I noted this idea in my notebook back in 2017, but at the time, I lacked the ability and resources to turn it into action. My current work allows for free exploration, so I revisited this idea and validated it through collaboration in the lab. The most critical part of the practical implementation is to decouple the communication between the chromatography injection and the mass spectrometry acquisition, allowing them to operate independently according to a pre-designed injection sequence without interrupting the MS data collection.&lt;/p&gt;
&lt;p&gt;We tested the performance of Fishing-style Injection on two chromatography columns in both positive and negative ion modes. We injected 158 samples in each mode, including a pooled sample as a QC every ten samples, with the total injection time controlled to under six hours. A conventional injection approach would have required several days and a multi-batch design. Because the marginal time added for each additional sample in this method is equal to the fixed time interval (one minute), it is possible to analyze one thousand samples a day while ensuring each sample has 10 minutes of separation time. Repeated tests showed that the retention time shift was within 10 seconds; the reversed-phase column actually performed better, and the shift on the HILIC column was also within an acceptable range. In terms of quantitative performance, within the 10-minute isocratic elution window, although we found 1000-3000 peaks during the &amp;ldquo;baiting&amp;rdquo; process, after considering only the peaks with a response deviation of less than 30% in the intra-sequence pooled samples and removing peaks found in the blank, we could still obtain 450-738 stable peaks. Again, the reversed-phase column performed better than the HILIC column. This result is already in the same order of magnitude as the number of peaks obtained with gradient injection after applying similar quality control standards. Furthermore, the issue of isomeric interference mentioned earlier does exist, but for blood metabolomics specifically, fewer than 2% of the peaks were affected. If you are analyzing lipids, you should definitely evaluate this effect first and optimize for a fixed interval time that minimizes interference. If throughput is a priority and you have tested that the compounds you care about can be separated by the column, then Fishing-style Injection offers a natural advantage in speed and cost. The cost per sample can be controlled to under $10, which is at least an order of magnitude lower than current injection methods, making high-throughput mass spectrometry screening a reality. Moreover, I have already written an open-source software package for the most critical data processing part of this method. If you want to try this method, there is no need for physical modifications to your existing instrument; you just need to control the chromatography injection and data acquisition separately. In fact, the mobile phase can be switched to gradient elution, but the injection interval would no longer be fixed and would need to be optimized for the column. Alternatively, the interval could be kept fixed, but the algorithm would need to recognize the elution pattern. I will leave these parts for future improvements by others.&lt;/p&gt;
&lt;p&gt;Feel free to try it out!&lt;/p&gt;
&lt;p&gt;Preprint link: &lt;a href=&#34;https://chemrxiv.org/engage/chemrxiv/article-details/6897568d728bf9025ec3ab62&#34;&gt;https://chemrxiv.org/engage/chemrxiv/article-details/6897568d728bf9025ec3ab62&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Introducing ThermoFlask: Simplifying Thermo .raw File Processing</title>
      <link>https://yufree.cn/en/2025/05/14/introducing-thermoflask-simplifying-thermo-raw-file-processing/</link>
      <pubDate>Wed, 14 May 2025 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2025/05/14/introducing-thermoflask-simplifying-thermo-raw-file-processing/</guid>
      <description>&lt;p&gt;In the world of high resolution mass spectrometry, Thermo &lt;code&gt;.raw&lt;/code&gt; files are a common format for storing raw data. However, processing these files into more accessible formats like mzML. To make this process easier, ThermoFlask—a Flask-based web application designed to simplify Thermo .raw file processing is developed.&lt;/p&gt;
&lt;h2 id=&#34;what-is-thermoflask&#34;&gt;What is ThermoFlask?&lt;/h2&gt;
&lt;p&gt;ThermoFlask is a lightweight, user-friendly web application that leverages the power of the ThermoRawFileParser to process .raw files. Whether you&amp;rsquo;re a researcher, data scientist, or bioinformatician, ThermoFlask provides an intuitive interface to upload, process, and download your data in just a few clicks. You can also deploy it as web service and here is a demo &lt;a href=&#34;https://fywupilibssa.us-east-1.clawcloudrun.com&#34;&gt;website&lt;/a&gt; using this docker image. You can also find the source code on &lt;a href=&#34;https://github.com/yufree/thermoflask&#34;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;key-features&#34;&gt;Key Features&lt;/h2&gt;
&lt;p&gt;ThermoFlask is packed with features to make your workflow seamless:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Batch Processing: Upload multiple .raw files at once for batch conversion.
Flexible Output Formats: Convert .raw files to mzML, indexed mzML, Parquet, MGF, or metadata-only formats.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Custom Parameters: Add custom arguments for advanced processing with the ThermoRawFileParser.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Debugging Support: View command output directly in the interface for troubleshooting.
Download Results: Easily download processed files through the web interface.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-get-started&#34;&gt;How to Get Started&lt;/h2&gt;
&lt;p&gt;Getting started with ThermoFlask is easy! Here’s a quick guide:&lt;/p&gt;
&lt;p&gt;You can pull the prebuilt Docker image from Docker Hub and run them locally using the following command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker run -p 5000:5000 yufree/thermoflask:latest
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Upload and Process Files: Open the web interface, upload your .raw files, select the desired output format, and start processing. Once complete, download your results directly from the interface.&lt;/p&gt;
&lt;p&gt;Alternatively, you can build the Docker image from the source code. Here’s how:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Build the Docker Image: Clone the &lt;a href=&#34;https://github.com/yufree/thermoflask&#34;&gt;repository&lt;/a&gt; and run the following command to build the Docker image:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker build -t thermoflask .
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Run the Application: Start the application with:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;docker run -p 5000:5000 thermoflask
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The application will be accessible at http://localhost:5000.&lt;/p&gt;
&lt;h2 id=&#34;acknowledgments&#34;&gt;Acknowledgments&lt;/h2&gt;
&lt;p&gt;ThermoFlask wouldn’t be possible without the incredible work of the ThermoRawFileParser team. I also thank the Flask community for providing a robust framework for building web applications.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Updates of GlobalStd algorithm in pmd package</title>
      <link>https://yufree.cn/en/2025/05/06/updates-of-globalstd-algorithm-in-pmd-package/</link>
      <pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2025/05/06/updates-of-globalstd-algorithm-in-pmd-package/</guid>
      <description>&lt;p&gt;Recently, I updated the GlobalStd algorithm in the pmd package. The new version includes several improvements and bug fixes. Here are the details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;getpaired&lt;/code&gt; function was re-written to improve the detection of redundant peaks. The paired mass distances (PMDs) are calculated for each retention time clusters. Then redundant peaks will be found in each cluster. For isotopougue pairs, the new version can detect them using network clusters of isotopologues and the lowest mass peak is retained for the following analysis. I also added a feature to detect multiple charged isotopologues clusters and those ions will be removed from the following analysis. Then I updated the multiple charged ions detection to remove all the paired ions for further analysis. The left paired ions will be used to calculate the frequency of PMDs within the cluster and each PMD will only be count once for each cluster. After the detection of high frequency PMDs within retention time clusters or potential &amp;ldquo;common in source reactions&amp;rdquo;, this function will return the PMDs for those redundant peaks, as well as the label for isotopougue, multiple charged ions and multiple charged isotopologues. Here, the ions retained in the high frequency PMDs will not be treated as multiple charged ions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Compared with previous version, new function added the feature to detect multiple charged isotopougue and improved the detection of redundant peaks. The code is easy to read with isolation of inner functions. The output of this version will be similar while more accurate compared with the previous version. I removed the &lt;code&gt;pmd2&lt;/code&gt; column in the output data frame and user can always calculate those values from &lt;code&gt;pmd&lt;/code&gt; and corresponding plot function has added &lt;code&gt;digits&lt;/code&gt; parameters to control the visualization.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;getstd&lt;/code&gt; function was also re-written to retain independent peaks. The new function didn&amp;rsquo;t change the original workflow while make it easy to maintain and read. The &lt;code&gt;corcutoff&lt;/code&gt; parameter was moved to &lt;code&gt;getpaired&lt;/code&gt; function as correlation cutoff was used to filter paired ions instead of standard ions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;globalstd&lt;/code&gt; function is a wrapper function for &lt;code&gt;getpaired&lt;/code&gt; and &lt;code&gt;getstd&lt;/code&gt; functions and has been updated accordingly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I changed the name of &lt;code&gt;getcluster&lt;/code&gt; function to &lt;code&gt;getpseudospectrum&lt;/code&gt; function and also be rewritten to focused on the recover of pseudo spectrum from MS1 feature table. This function will no longer use &lt;code&gt;getstd&lt;/code&gt; function and will directly use &lt;code&gt;getpaired&lt;/code&gt; function to generate the pseudo spectrum. If the independent peaks were used, extra merge process will be involved as previous version and the pseudo spectrum numbers will always be less than the independent peaks number. It will still output a vector &lt;code&gt;stdmassindex2&lt;/code&gt; to find base peaks for each pseudo spectrum. It will also tell you the coverage of explainable ions coverage (~70%) in the data. We treated the ions within cluster while without passing PMD relation correlation cutoff as one pseudo spectrum, which might introduce a false positive for data using correlation cutoff while keep the opportunity to check them later.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;getcorcluster&lt;/code&gt; function has been changed to &lt;code&gt;getcorspeudospectrum&lt;/code&gt; function. The new version will use the same logic as previous version to detect psudo spectrum based on correlation of ion&amp;rsquo;s intensity and will also output the independent peaks with the largest m/z peaks and the base peaks with the largest intensity in their corresponding pseudo spectrum. User can use both &lt;code&gt;getpseudospectrum&lt;/code&gt; and &lt;code&gt;getcorpseudospectrum&lt;/code&gt; functions to generate pseudo spectrum and the latter functions will have less numbers of pseudo spectrum. I would recommend to use &lt;code&gt;getpseudospectrum&lt;/code&gt; function to generate pseudo spectrum with a better explanation of the ions. The return object will contain a &lt;code&gt;pseudo&lt;/code&gt; table instead of &lt;code&gt;clusters&lt;/code&gt; table for further investigation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;globalstd&lt;/code&gt; vignettes has been updated.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are using the previous version, it&amp;rsquo;s recommended to update the package to the latest version on GitHub. The new version is more robust and easier to use/maintain. If your code failed after the updates, it might come from the removal of &lt;code&gt;pmd2&lt;/code&gt; column and you can always generate this column by &lt;code&gt;round(pmd,digits)&lt;/code&gt;. If you have any questions or feedback, please feel free to reach out to me.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Reactomics analysis for MS-only data</title>
      <link>https://yufree.cn/en/2024/05/29/reactomics-analysis-for-ms-only-data/</link>
      <pubDate>Wed, 29 May 2024 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2024/05/29/reactomics-analysis-for-ms-only-data/</guid>
      <description>


&lt;p&gt;Recently, I received multiple requests of reactomics analysis for MS only data such as FT-ICR MS or MS imaging data. In this case, it’s better to summary the answer with an example as reference. Here you are!&lt;/p&gt;
&lt;p&gt;When retention time is not provided, m/z vector can still be used to check reaction level changes. To apply this analysis, you need to install the devel version(&amp;gt;=0.2.6) of PMD package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;remotes::install_github(&amp;#39;yufree/pmd&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Downloading GitHub repo yufree/pmd@HEAD&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## ── R CMD build ─────────────────────────────────────────────────────────────────
## * checking for file ‘/private/var/folders/nj/68q18qjd2x1cb8my282c58cr0000gn/T/Rtmpx52AfQ/remotes44f5531ee188/yufree-pmd-87e8de1/DESCRIPTION’ ... OK
## * preparing ‘pmd’:
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building ‘pmd_0.2.6.tar.gz’&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can still use &lt;code&gt;getrda&lt;/code&gt; to find the high frequency PMDs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(pmd)
data(spmeinvivo)
# get the m/z
mz &amp;lt;- spmeinvivo$mz
# get the m/z intensity for all m/z, the row order is the same with mz
insms &amp;lt;- spmeinvivo$data
# check high frequency pmd
sda &amp;lt;- getrda(mz)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 164462 pmd found.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 20 pmd used.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;colnames(sda)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;0&amp;quot;       &amp;quot;1.001&amp;quot;   &amp;quot;1.002&amp;quot;   &amp;quot;1.003&amp;quot;   &amp;quot;1.004&amp;quot;   &amp;quot;2.015&amp;quot;   &amp;quot;2.016&amp;quot;  
##  [8] &amp;quot;14.015&amp;quot;  &amp;quot;17.026&amp;quot;  &amp;quot;18.011&amp;quot;  &amp;quot;21.982&amp;quot;  &amp;quot;28.031&amp;quot;  &amp;quot;28.032&amp;quot;  &amp;quot;44.026&amp;quot; 
## [15] &amp;quot;67.987&amp;quot;  &amp;quot;67.988&amp;quot;  &amp;quot;88.052&amp;quot;  &amp;quot;116.192&amp;quot; &amp;quot;135.974&amp;quot; &amp;quot;135.975&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# save them as numeric vector
hfpmd &amp;lt;- as.numeric(colnames(sda))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then &lt;code&gt;getpmddf&lt;/code&gt; function can be used to extract all the paired ions for certain PMD.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get details for certain pmd
pmddf &amp;lt;- getpmddf(mz,pmd=18.011,digits = 3)
# add intensity for all the paired ions
mz1ins &amp;lt;- insms[match(pmddf$ms1,mz),]
mz2ins &amp;lt;- insms[match(pmddf$ms2,mz),]
# get the pmd pair intensity
pmdins &amp;lt;- mz1ins+mz2ins
# get the pmd total intensity across samples
pmdinsall &amp;lt;- apply(pmdins,2,sum)
# show the PMD intensity
pmdinsall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 1405_Fish1_F1 1405_Fish1_F2 1405_Fish1_F3 1405_Fish2_F1 1405_Fish2_F2 
##       9898514       7801273      10363201       5847334      10479551 
## 1405_Fish2_F3 1405_Fish3_F1 1405_Fish3_F2 1405_Fish3_F3 
##       7021375      10584976      12989961      12559649&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also calculate the static or dynamic PMD intensity for m/z only data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get the ratio of larger m/z over smaller m/z
ratio &amp;lt;- mz2ins/mz1ins
# filter PMD based on RSD% across samples
# cutoff 30%
cutoff &amp;lt;- 0.3
# get index for static PMD
rsdidx &amp;lt;- apply(ratio,1,function(x) sd(x)/mean(x)&amp;lt;cutoff)
# get static PMD
pmddfstatic &amp;lt;- pmddf[rsdidx,]
# get static intensity
pmdinsstatic &amp;lt;- pmdins[rsdidx,]
# normalize the ions pair intensity to avoid influences from large response factors
pmdinsstaticscale &amp;lt;- t(scale(t(pmdinsstatic)))
# get the pmd static intensity across samples
pmdinsstaticall &amp;lt;- apply(pmdinsstaticscale,2,sum)
# show the PMD static intensity for each sample
pmdinsstaticall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 1405_Fish1_F1 1405_Fish1_F2 1405_Fish1_F3 1405_Fish2_F1 1405_Fish2_F2 
##         1.027       -16.704         2.374       -27.241        12.434 
## 1405_Fish2_F3 1405_Fish3_F1 1405_Fish3_F2 1405_Fish3_F3 
##       -17.758         7.924        19.803        18.142&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get index for dynamic PMD
rsdidx &amp;lt;- apply(ratio,1,function(x) sd(x)/mean(x)&amp;gt;=cutoff)
# get dynamic PMD
pmddfdynamic &amp;lt;- pmddf[rsdidx,]
# get dynamic intensity for ms1 and ms2
pmdinsdynamicms1 &amp;lt;- apply(mz1ins[rsdidx,],1,function(x) sd(x)/mean(x))
pmdinsdynamicms2 &amp;lt;- apply(mz2ins[rsdidx,],1,function(x) sd(x)/mean(x))
# find the stable ms and use ratio as intensity
idx &amp;lt;- pmdinsdynamicms1&amp;gt;pmdinsdynamicms2
pmdinsdynamic &amp;lt;- ratio[rsdidx,]
pmdinsdynamic[idx,] &amp;lt;- 1/ratio[rsdidx,][idx,]
# get the pmd dynamic intensity across samples
pmdinsdynamicall &amp;lt;- apply(pmdinsdynamic,2,sum)
# show the PMD dynamic intensity for each sample
pmdinsdynamicall&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 1405_Fish1_F1 1405_Fish1_F2 1405_Fish1_F3 1405_Fish2_F1 1405_Fish2_F2 
##         374.2         315.6         388.0         207.8         233.4 
## 1405_Fish2_F3 1405_Fish3_F1 1405_Fish3_F2 1405_Fish3_F3 
##         199.9         283.5         328.0         256.2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can also use &lt;code&gt;getpmddf&lt;/code&gt; function extract all the paired ions for multiple PMDs. Then you could generate the network based on the output.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get details for certain pmd
pmddf &amp;lt;- getpmddf(mz,pmd=hfpmd,digits = 3)
# viz by igraph package
library(igraph)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;igraph&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:stats&amp;#39;:
## 
##     decompose, spectrum&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:base&amp;#39;:
## 
##     union&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;net &amp;lt;- graph_from_data_frame(pmddf,directed = F)
pal &amp;lt;- grDevices::rainbow(length(unique(E(net)$diff2)))
plot(net,vertex.label=NA,vertex.size = 5,edge.width = 3,edge.color = pal[as.numeric(as.factor(E(net)$diff2))],main = &amp;#39;PMD network&amp;#39;)
legend(&amp;quot;topright&amp;quot;,bty = &amp;quot;n&amp;quot;,
       legend=unique(E(net)$diff2),
       fill=unique(pal[as.numeric(as.factor(E(net)$diff2))]), border=NA,horiz = F)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2024/05/29/reactomics-analysis-for-ms-only-data/index_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If you prefer to get a pmd network for a specific mass. You can still use &lt;code&gt;getchain&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(spmeinvivo)
spmeinvivo$rt &amp;lt;- NULL
chain &amp;lt;- getchain(spmeinvivo,diff = c(2.02,14.02,15.99,58.04,13.98),mass = 286.3101,digits = 2,corcutoff = 0)
# show as network
net &amp;lt;- graph_from_data_frame(chain$sdac,directed = F)
pal &amp;lt;- grDevices::rainbow(5)
plot(net,vertex.label=round(as.numeric(V(net)$name),2),vertex.size =5,edge.width = 3,edge.color = pal[as.numeric(as.factor(E(net)$diff2))],vertex.label.dist=1,vertex.color=ifelse(round(as.numeric(V(net)$name),4) %in% 286.3101,&amp;#39;red&amp;#39;,&amp;#39;black&amp;#39;), main = &amp;#39;PMD network&amp;#39;)
legend(&amp;quot;topright&amp;quot;,bty = &amp;quot;n&amp;quot;,
       legend=unique(E(net)$diff2),
       fill=unique(pal[as.numeric(as.factor(E(net)$diff2))]), border=NA,horiz = F)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2024/05/29/reactomics-analysis-for-ms-only-data/index_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Call for Papers: Artificial Intelligence and Machine Learning for Environmental &amp; Health</title>
      <link>https://yufree.cn/en/2024/03/10/call-for-papers-artificial-intelligence-and-machine-learning-for-environmental-health/</link>
      <pubDate>Sun, 10 Mar 2024 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2024/03/10/call-for-papers-artificial-intelligence-and-machine-learning-for-environmental-health/</guid>
      <description>&lt;p&gt;Artificial intelligence (AI) and machine learning (ML) are transformative fields of computer science. They empower researchers to develop algorithms and models capable of extracting meaningful insights, making predictions, and automating tasks by analyzing and learning from data. Due to the intricate nature of environmental health issues, the integration of AI and machine learning is imperative. These advanced computational technologies have the potential to revolutionize environmental health studies through fundamentally improve and advance environmental exposure assessment, environmental health risk assessment, and environmental policy development.&lt;/p&gt;
&lt;p&gt;This Virtual Special Issue from &lt;a href=&#34;https://pubs.acs.org/journal/ehnea2&#34;&gt;Environment &amp;amp; Health&lt;/a&gt; extends an invitation to scientists to share their innovative work on leveraging AI and ML for environmental health studies.&lt;/p&gt;
&lt;p&gt;We welcome contributions of Articles, Reviews, Perspectives or Viewpoints that delve into topics including, but not limited to:&lt;/p&gt;
&lt;p&gt;Source appointment
Chemical toxicity prediction
Identification or screening of unknown pollutants
Human exposure assessment
Molecular mechanisms between exposure and disease
Data compliance and ethics
By sharing these findings or perspectives, we hope to spur further innovation and advancements in this critical and rapidly evolving field.&lt;/p&gt;
&lt;p&gt;Organizing Editors
Miao Yu, Ph.D., Guest Editor
The Jackson Laboratory, USA&lt;/p&gt;
&lt;p&gt;Mingliang Fang, Ph.D., Guest Editor
Fudan University, China&lt;/p&gt;
&lt;p&gt;Zhenyu Tian, Ph.D., Guest Editor
Northeastern University, USA&lt;/p&gt;
&lt;p&gt;Bin Wang, Ph.D., Guest Editor
Peking University, China&lt;/p&gt;
&lt;p&gt;Douglas Walker, Ph.D., Guest Editor
Emory University, USA&lt;/p&gt;
&lt;p&gt;Yuming Guo, Ph.D., Associate Editor, Environment &amp;amp; Health
Monash University, Australia&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Molecular networking in R</title>
      <link>https://yufree.cn/en/2023/06/25/molecular-networking-in-r/</link>
      <pubDate>Sun, 25 Jun 2023 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2023/06/25/molecular-networking-in-r/</guid>
      <description>&lt;p&gt;I found lots of research using molecular networking in this year&amp;rsquo;s ASMS annual meeting. However, I didn&amp;rsquo;t find R code or package for molecular networking. It seems most people using molecular networking are using GNPS and don&amp;rsquo;t talk too much about the algorithm behind molecular networking. In this post, I will make a brief introduction about molecular networking and show some dirty code to perform molecular networking in R.&lt;/p&gt;
&lt;h2 id=&#34;what-is-molecular-networking&#34;&gt;What is molecular networking?&lt;/h2&gt;
&lt;p&gt;Molecular networking is more about molecular network linked by MS2 similarity. In the network, nodes represent compounds with different MS2 spectra and edges represent the similarity of their MS2 spectra. When two compounds are connected by edge, they should have structure similarity and potential biological functional similarity.&lt;/p&gt;
&lt;p&gt;From this definition, we know the precursors of connected compounds should be different. This is the major difference between molecular networking and MS2 spectra matching. In MS2 spectra matching, the purpose is identification of unknown MS2 spectra. In molecular networking, the purpose is classification of similar compounds. If one node in the molecular networking is known compound, we could infer the other nodes connected with this compound should also be compounds similar to this known compound such as metabolites or congeners. Though most of the users of GNPS using molecular networking as annotation tools, the most unique feature of molecular networking is to interpret the network for biological purpose. In the &lt;a href=&#34;https://www.pnas.org/doi/10.1073/pnas.1203689109&#34;&gt;original publication&lt;/a&gt; of molecualr networking, such tool is designed to find new nature products, which is not for identification purpose only. This post is also not focused on identification and care more about the relation network among molecular.&lt;/p&gt;
&lt;h2 id=&#34;how-to-define-ms2-similairy&#34;&gt;How to define MS2 similairy?&lt;/h2&gt;
&lt;p&gt;If you are familiar with MS2 spectra matching, you might realize the precursors of matching two spectra should be the same or has isotopologue shift. However, molecular networking will consider the spectra similarity with different precursors, which is called modified cosine similarity in their original paper.&lt;/p&gt;
&lt;p&gt;Before we discuss the modified cosine similarity, let&amp;rsquo;s review cosine similarity. Cosine similarity is very straightforward. If we have two vectors like [1,10,1] and [10,100,10], the cosine similarity is to calculate the normalized dot product, which can also be interpreted as the cosine of the angle between two vector. For vector [1,10,1] and [10,100,10], the value should be:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$$ cos(\theta) = \frac{(1*10 + 10*100 + 1*10)}{\sqrt{1*1+10*10+1*1} * \sqrt{10*10+100*100+10*10}} = 1 $$&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In this case, the cosine value is 1 and angle should be 0. Those two vectors are similar in terms of cosine similarity.&lt;/p&gt;
&lt;p&gt;For MS2 spectra matching, such two vectors should be the intensities with same m/z. In this case, you need to define the tolerance of m/z shifts to align two MS2 spectra before the calculation of cosine similarity.&lt;/p&gt;
&lt;p&gt;OK, I hope you understand the regular way to compare two MS2 spectra now. Now we need to modify this algorithm for molecular networking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Step 1: calculate paired mass distance between precursors&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Step 2: Apply this mass distance to all the query MS2 spectra to generate a shift version of MS2 spectra with the same intensities profile&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Step 3: Align the m/z between query MS2 spectra (both the original and shifted version of target MS2 spectra) and target MS2 spectra&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Step 4: Calculate the cosine similarity between the aligned intensity as modified cosine similarity&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I know you still confuse about the algorithm. I will give an example. Compound A has m/z 300 as precursor and m/z [100,200,250] as fragment ions with intensity [100,200,300]. Compound B had m/z 215.995 as precursor and m/z [100, 200, 265.995] as fragment ions with intensity [10,20,30].&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Step 1: the paired mass distance of precursors is 15.995&lt;/li&gt;
&lt;li&gt;Step 2: we generate a shift version of spectra A with m/z [115.995, 215.995, 265.995] with intensity [100,200,300]&lt;/li&gt;
&lt;li&gt;Step 3: Align both the original and shifted spectra A with spectra B. We got aligned m/z[100,200,265.995] with intensity[100,200,300] for A and [10,20,30] for B&lt;/li&gt;
&lt;li&gt;Step 4: the cosine similarity of between A and B is 1, which means A and B are structure similar to each other&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the above example, compound B is the oxidized metabolite of compound A. One fragment ions show the mass shift of oxidation while the smaller ions will not contain the fragment with oxidized parts. In our example, m/z 250 from A is aligned to m/z 265.995 from B considering mass shift of precursors while the other two ions(m/z 100 and 200) are still aligned with raw spectra of A and B. All those three fragment ions are aligned for those modified cosine similarity calculation. Such scenario is highly true for real world compounds and smart to link compounds with similar MS2 spectra. Now we can infer compound B should be a metabolite of compound A by checking and interpret the mass shift of precursors.&lt;/p&gt;
&lt;h2 id=&#34;different-between-molecular-networking-and-pmd-network&#34;&gt;Different between molecular networking and PMD network&lt;/h2&gt;
&lt;p&gt;I also &lt;a href=&#34;https://www.nature.com/articles/s42004-020-00403-z&#34;&gt;published&lt;/a&gt; tools to construct paired mass distance(PMD) network by MS1 only data. You might ask the differences between molecular networking and PMD network. Here is the similarity and difference:&lt;/p&gt;
&lt;p&gt;In both PMD network and molecular networking, node are different compounds and the connection could be displayed as paired mass distance or mass shift. Both of them could be used to interpret relation among the compounds found in certain samples.&lt;/p&gt;
&lt;p&gt;In PMD network, the paired mass distance is defined by paired mass distance of two MS1 ions. To perform PMD network analysis, you need to remove the redundant peaks from the same compounds by GlobalStd algorithm. Only the predefined PMDs will be used for connection. Such PMDs list could be generated based on domain knowledge or purely based on the frequency of PMDs among ions. When some PMDs always be found, such reaction should be considered as important relations.&lt;/p&gt;
&lt;p&gt;In molecular networking, the paired mass distance is calculated between two precursors of two MS2 spectra. Modified cosine similarity is used to define the connection, which can also be interpreted by mass shifts. Here, you don&amp;rsquo;t need to tell the mass shifts of precursors and the algorithm will do this job. The only issue is that you need high quality MS2 data. In my experience, MS2 data collected for certain projects are always &amp;lsquo;identify&amp;rsquo; the similar compounds profile and DDA mode usually only cover 10-20% of the MS1 ions found in corresponding MS1 full scan data.&lt;/p&gt;
&lt;p&gt;In my opinion, if you preferred a high coverage of compounds in the samples, try PMD network on MS1 data first and then collected pseudotargeted MS2 data based on you PMD network results with modified cosine similarity matching. On the other hand, if you preferred a high confidence of identification at the very beginning, try molecular networking directly.&lt;/p&gt;
&lt;h2 id=&#34;r-code-for-molecuar-networking&#34;&gt;R code for molecuar networking&lt;/h2&gt;
&lt;p&gt;Here are two functions for molecular networking and I read the &lt;a href=&#34;https://matchms.readthedocs.io/en/latest/api/matchms.similarity.ModifiedCosine.html&#34;&gt;python code&lt;/a&gt; of matchms package to write those R functions.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;find_matches&lt;/code&gt; is used to align the m/z of two MS2 spectra and &lt;code&gt;mnmatch&lt;/code&gt; is used to perform molecular networking for certain MS2 spectra files. This function will only calculate the modified cosine similarity for all the MS2 spectra in one file and return a list object with two elements: one is the data table for network and another is also list object with matched spectra for detailed check.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;find_matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(spec1_mz, spec2_mz, tolerance, shift &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;) {
        matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;data.frame&lt;/span&gt;()
        &lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(peak1_idx in &lt;span style=&#34;color:#a6e22e&#34;&gt;seq_along&lt;/span&gt;(spec1_mz)) {
                mz &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; spec1_mz[peak1_idx]
                low_bound &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; mz &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; tolerance
                high_bound &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; mz &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; tolerance
                &lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(peak2_idx in &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(spec2_mz))) {
                        mz2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; spec2_mz[peak2_idx] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; shift
                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(mz2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; high_bound &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&lt;/span&gt; mz2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; low_bound) {
                                matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rbind.data.frame&lt;/span&gt;(matches,
                                                            &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(peak1_idx, peak2_idx))
                        }
                }
        }
        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;nrow&lt;/span&gt;(matches) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;) {
                &lt;span style=&#34;color:#a6e22e&#34;&gt;colnames&lt;/span&gt;(matches) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;query&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;query2&amp;#39;&lt;/span&gt;)
                &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(matches)
        } else{
                &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;NULL&lt;/span&gt;)
        }
}
mnmatch &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(spectra,
                    binstep,
                    cf,
                    npeaks) {
        matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;()
        intersected_indices &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()
        &lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(i in &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(spectra) &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)) {
                &lt;span style=&#34;color:#a6e22e&#34;&gt;for &lt;/span&gt;(j &lt;span style=&#34;color:#a6e22e&#34;&gt;in &lt;/span&gt;(i &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(spectra)) {
                        ins &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;intensity&lt;/span&gt;(spectra)[[i]]
                        ins &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; ins &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sum&lt;/span&gt;(ins)
                        pmz &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;precursorMz&lt;/span&gt;(spectra[i])
                        pmz2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;precursorMz&lt;/span&gt;(spectra[j])
                        diff &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; pmz &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; pmz2
                        rt &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rtime&lt;/span&gt;(spectra[i])
                        rt2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rtime&lt;/span&gt;(spectra[j])
                        diffrt &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; rt &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; rt2
                        insx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insx2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insnx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insnx2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;()
                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;abs&lt;/span&gt;(diffrt) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt;) {
                                &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;abs&lt;/span&gt;(diff) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; binstep) {
                                        query &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(spectra)[[i]]
                                        query2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(spectra)[[j]]
                                        re &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;find_matches&lt;/span&gt;(query,
                                                           query2,
                                                           binstep,
                                                           shift &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;is.null&lt;/span&gt;(re)) {
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;intensity&lt;/span&gt;(spectra)[[j]]
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sum&lt;/span&gt;(insn)
                                                insnx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2]
                                                insx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; ins[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query]
                                        }
                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(insnx1) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; npeaks) {
                                                cos &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insx1,
                                                                  insnx1) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sqrt&lt;/span&gt;(
                                                                          &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insx1) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insnx1)
                                                                  )
                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(cos) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; cf) {
                                                        intersected_indices &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;rbind&lt;/span&gt;(
                                                                        intersected_indices,
                                                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(
                                                                                i,
                                                                                j,
                                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(
                                                                                        cos
                                                                                ),
                                                                                diff
                                                                        )
                                                                )
                                                        ms1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; query[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query]
                                                        ms2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; query2[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2]
                                                        query &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ms1,
                                                                ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insx1
                                                        )
                                                        query2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ms2,
                                                                ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insnx1
                                                        )
                                                        queryraw &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(
                                                                                spectra
                                                                        )[[i]],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ins
                                                                )
                                                        query2raw &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(
                                                                                spectra
                                                                        )[[j]],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insn
                                                                )
                                                        diff &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; diff
                                                        matcht &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;(
                                                                query,
                                                                query2,
                                                                queryraw,
                                                                query2raw,
                                                                diff
                                                        )
                                                        matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;append&lt;/span&gt;(matches,
                                                                          &lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;(
                                                                                  matcht
                                                                          ))
                                                }
                                        }
                                } else{
                                        query &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(spectra)[[i]]
                                        query2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(spectra)[[j]]
                                        re &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;find_matches&lt;/span&gt;(query,
                                                           query2,
                                                           binstep,
                                                           shift &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;is.null&lt;/span&gt;(re)) {
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;intensity&lt;/span&gt;(spectra)[[j]]
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sum&lt;/span&gt;(insn)
                                                insnx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2]
                                                insx1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; ins[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query]
                                        }
                                        re2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;find_matches&lt;/span&gt;(query,
                                                            query2,
                                                            binstep,
                                                            shift &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; diff)
                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;is.null&lt;/span&gt;(re2)) {
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;intensity&lt;/span&gt;(spectra)[[j]]
                                                insn &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sum&lt;/span&gt;(insn)
                                                insnx2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; insn[re2&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2]
                                                insx2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; ins[re2&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query]
                                        }
                                        insx &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(insx1, insx2)
                                        insnx &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(insnx1, insnx2)
                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;length&lt;/span&gt;(insx) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; npeaks) {
                                                cos &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insx, insnx) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;sqrt&lt;/span&gt;(
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insx) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;crossprod&lt;/span&gt;(insnx)
                                                        )
                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(cos) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; cf) {
                                                        intersected_indices &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;rbind&lt;/span&gt;(
                                                                        intersected_indices,
                                                                        &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(
                                                                                i,
                                                                                j,
                                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;as.numeric&lt;/span&gt;(
                                                                                        cos
                                                                                ),
                                                                                diff
                                                                        )
                                                                )
                                                        ms1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(query[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query],
                                                                 query[re2&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query])
                                                        ms2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(query2[re&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2],
                                                                 query2[re2&lt;span style=&#34;color:#f92672&#34;&gt;$&lt;/span&gt;query2])
                                                        query &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ms1&lt;span style=&#34;color:#a6e22e&#34;&gt;[order&lt;/span&gt;(ms1)],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insx&lt;span style=&#34;color:#a6e22e&#34;&gt;[order&lt;/span&gt;(ms1)]
                                                                )
                                                        query2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ms2&lt;span style=&#34;color:#a6e22e&#34;&gt;[order&lt;/span&gt;(ms2)],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insnx&lt;span style=&#34;color:#a6e22e&#34;&gt;[order&lt;/span&gt;(ms2)]
                                                                )
                                                        queryraw &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(
                                                                                spectra
                                                                        )[[i]],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; ins
                                                                )
                                                        query2raw &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;
                                                                &lt;span style=&#34;color:#a6e22e&#34;&gt;cbind.data.frame&lt;/span&gt;(
                                                                        mz &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mz&lt;/span&gt;(
                                                                                spectra
                                                                        )[[j]],
                                                                        ins &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; insn
                                                                )
                                                        diff &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; diff
                                                        matcht &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;(
                                                                query,
                                                                query2,
                                                                queryraw,
                                                                query2raw,
                                                                diff
                                                        )
                                                        matches &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;append&lt;/span&gt;(matches,
                                                                          &lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;(
                                                                                  matcht
                                                                          ))
                                                }
                                        }
                                }
                        }
                }
        }
        &lt;span style=&#34;color:#a6e22e&#34;&gt;if &lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;nrow&lt;/span&gt;(intersected_indices) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;) {
                &lt;span style=&#34;color:#a6e22e&#34;&gt;colnames&lt;/span&gt;(intersected_indices) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;query&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;query2&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;cos&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;diff&amp;#39;&lt;/span&gt;)
                intersected_indices &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;as.data.frame&lt;/span&gt;(intersected_indices)
        } else {
                intersected_indices &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;NULL&lt;/span&gt;
        }
        &lt;span style=&#34;color:#a6e22e&#34;&gt;return&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;list&lt;/span&gt;(intersected_indices, matches))
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The usage is simple. You need to prepare mgf file for MS2 spectra. Here we also use binstep for 0.001Da to align two m/z, minimal 5 peaks for matching, and cutoff of 0.6 for cosine similarity:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-r&#34; data-lang=&#34;r&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(Spectra)
specs &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;Spectra&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;YOUFILE.msp&amp;#39;&lt;/span&gt;, source &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; MsBackendMsp&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;MsBackendMsp&lt;/span&gt;())
result &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;mnmatch&lt;/span&gt;(specs,binstep&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0.001&lt;/span&gt;,cf&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0.6&lt;/span&gt;,npeaks&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;5&lt;/span&gt;)
table &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; result[[1]]
&lt;span style=&#34;color:#a6e22e&#34;&gt;library&lt;/span&gt;(igragh)
net &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; igraph&lt;span style=&#34;color:#f92672&#34;&gt;::&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;from_data_frame&lt;/span&gt;(table,directed &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; F)
&lt;span style=&#34;color:#75715e&#34;&gt;# display molecular networking&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;plot&lt;/span&gt;(net)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Invitation to Submit Manuscripts for a Special Issue of Chemosphere</title>
      <link>https://yufree.cn/en/2022/07/11/invitation-to-submit-manuscripts-for-a-special-issue-of-chemosphere/</link>
      <pubDate>Mon, 11 Jul 2022 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2022/07/11/invitation-to-submit-manuscripts-for-a-special-issue-of-chemosphere/</guid>
      <description>&lt;p&gt;The prestigious journal &lt;em&gt;Chemosphere&lt;/em&gt; is currently running a special issue entitled &amp;quot; Human Health Effects of Chemical Mixture Exposures&amp;quot;. As we are acting as guest editors for this issue, we would like to welcome contributions from various disciplines. We kindly invite you to consider submitting your full paper to this special issue.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Guest editors&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Prof. Dr. Peng Gao University of Pittsburgh School of Public Health &lt;a href=&#34;mailto:peg47@pitt.edu&#34;&gt;peg47@pitt.edu&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Prof. Dr. Hui Peng University of Toronto &lt;a href=&#34;mailto:hui.peng@utoronto.ca&#34;&gt;hui.peng@utoronto.ca&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dr. Miao Yu The Jackson Laboratory &lt;a href=&#34;mailto:miao.yu@jax.org&#34;&gt;miao.yu@jax.org&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Special issue information&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Current environmental chemistry and toxicology studies mainly focus on a single stressor or single group of stressors, which does not reflect the multiple stressors in the dynamic exposome that humans are facing. Usually, human exposures are presented as cocktails with thousands of organic chemicals and dozens of inorganic chemicals being presented. However, the significant relationships and interactions among those stressors in the environment and their holistic human health effects remain unclear. Fortunately, the rapid developments of various techniques provide us with the possibility of revealing these mixture exposures. This Chemosphere special issue aims to provide a platform to dissect the complexity of chemical mixture exposures from experimental, analytical, and computational perspectives.&lt;/p&gt;
&lt;p&gt;Manuscript submission information:
The submission website for this journal is located at &lt;a href=&#34;https://www.editorialmanager.com/chemosphere/default.aspx&#34;&gt;here&lt;/a&gt;. Author guidelines and manuscript submission to Chemosphere can be found &lt;a href=&#34;https://www.elsevier.com/journals/chemosphere/0045-6535/guide-for-authors&#34;&gt;here&lt;/a&gt;. To ensure that your manuscript is correctly submitted to the special issue, please select ‘‘VSI: Exposure of Mixture” when you reach the step of “Article Type” during the submission process.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Keywords&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Human exposure, Chemical mixture, Health effects&lt;/p&gt;
&lt;p&gt;Learn &lt;a href=&#34;https://www.elsevier.com/authors/submit-your-paper/special-issues&#34;&gt;more&lt;/a&gt; about the benefits of publishing in a special issue.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Using xcmsrocker on HPC via Singularity</title>
      <link>https://yufree.cn/en/2022/05/26/using-xcmsrocker-on-hpc-via-singularity/</link>
      <pubDate>Thu, 26 May 2022 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2022/05/26/using-xcmsrocker-on-hpc-via-singularity/</guid>
      <description>&lt;p&gt;Docker should be the most popular container platform. Container distribution via dockerhub makes it easy to provide all-in-one development/data analysis environment for scientist. It&amp;rsquo;s always a good idea to use container on the high performance computing (HPC) cluster to accelerate data processing. Since Docker provides root access to the system they are running on, it&amp;rsquo;s always not allowed to be used on HPC. On the other hand, Singularity is more friendly to scientific research with MPI support, as well as security restriction.&lt;/p&gt;
&lt;p&gt;I released &lt;a href=&#34;https://github.com/yufree/xcmsrocker&#34;&gt;xcmsrocker&lt;/a&gt; image for metabolomics data analysis for a long time and always said that it should be easy to deploy on HPC or cloud computing platform. It&amp;rsquo;s always right for the latter options and you can use docker image on the most popular cloud. However, you will need some extra work for HPC.&lt;/p&gt;
&lt;p&gt;The first issue is to build a Singularity image from a docker image hosted on Docker Hub. You need to load singularity module after login on HPC:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ml singularity
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then pull the xcmsrocker image&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;singularity pull docker://yufree/xcmsrocker:lastest
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now you will find a file with name &amp;lsquo;xcmsrocker_latest.sif&amp;rsquo; in you home folder. If your HPC use slurm for job management, you can use the following job script and save as a file called &amp;ldquo;rstudio-server.job&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/bin/sh
#SBATCH --time=05:00:00
#SBATCH --signal=USR2
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=8192
#SBATCH --output=/home/%u/rstudio-server.job.%j

# Create temporary directory to be populated with directories to bind-mount in the container
# where writable file systems are necessary. Adjust path as appropriate for your computing environment.
workdir=$(python -c &#39;import tempfile; print(tempfile.mkdtemp())&#39;)

mkdir -p -m 700 ${workdir}/run ${workdir}/tmp ${workdir}/var/lib/rstudio-server
cat &amp;gt; ${workdir}/database.conf &amp;lt;&amp;lt;END
provider=sqlite
directory=/var/lib/rstudio-server
END

# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
# libraries used by R) from spawning more threads than the number of processors
# allocated to the job.
#
# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
# personal libraries from any R installation in the host environment

cat &amp;gt; ${workdir}/rsession.sh &amp;lt;&amp;lt;END
#!/bin/sh
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
export R_LIBS_USER=${HOME}/R/xcmsrocker
exec rsession &amp;quot;\${@}&amp;quot;
END

chmod +x ${workdir}/rsession.sh

export SINGULARITY_BIND=&amp;quot;${workdir}/run:/run,${workdir}/tmp:/tmp,${workdir}/database.conf:/etc/rstudio/database.conf,${workdir}/rsession.sh:/etc/rstudio/rsession.sh,${workdir}/var/lib/rstudio-server:/var/lib/rstudio-server&amp;quot;

# Do not suspend idle sessions.
# Alternative to setting session-timeout-minutes=0 in /etc/rstudio/rsession.conf
# https://github.com/rstudio/rstudio/blob/v1.4.1106/src/cpp/server/ServerSessionManager.cpp#L126
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0

export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=$(openssl rand -base64 15)
# get unused socket per https://unix.stackexchange.com/a/132524
# tiny race condition between the python &amp;amp; singularity commands
readonly PORT=$(python -c &#39;import socket; s=socket.socket(); s.bind((&amp;quot;&amp;quot;, 0)); print(s.getsockname()[1]); s.close()&#39;)
cat 1&amp;gt;&amp;amp;2 &amp;lt;&amp;lt;END
1. SSH tunnel from your workstation using the following command:

   ssh -N -L 8787:${HOSTNAME}:${PORT} ${SINGULARITYENV_USER}@LOGIN-HOST

   and point your web browser to http://localhost:8787

2. log in to RStudio Server using the following credentials:

   user: ${SINGULARITYENV_USER}
   password: ${SINGULARITYENV_PASSWORD}

When done using RStudio Server, terminate the job by:

1. Exit the RStudio Session (&amp;quot;power&amp;quot; button in the top right corner of the RStudio window)
2. Issue the following command on the login node:

      scancel -f ${SLURM_JOB_ID}
END

singularity exec --cleanenv xcmsrocker_latest.sif \
    rserver --www-port ${PORT} \
            --auth-none=0 \
            --auth-pam-helper-path=pam-helper \
            --auth-stay-signed-in-days=30 \
            --auth-timeout-minutes=0 \
            --server-user XXX \
            --rsession-path=/etc/rstudio/rsession.sh
printf &#39;rserver exited&#39; 1&amp;gt;&amp;amp;2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This file is modified from &lt;a href=&#34;https://www.rocker-project.org/use/singularity/&#34;&gt;Rocker&amp;rsquo;s singularity tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here, you need to change &lt;code&gt;--server-user XXX&lt;/code&gt; to the user name for your HPC. For example, my user name to login HPC is &amp;lsquo;yufree&amp;rsquo; and I will set &lt;code&gt;--server-user yufree&lt;/code&gt;. This option will make sure you can login in your RStudio server and the default user don&amp;rsquo;t have access.&lt;/p&gt;
&lt;p&gt;Then submit this job to HPC:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sbatch rstudio-server.job
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then you should see a file with job ID as extension such as &amp;lsquo;rstudio-server.job.xxxxxxx&amp;rsquo; in your HPC home folder. &amp;lsquo;xxxxxxx&amp;rsquo; is your job ID. Then you can check the content in this file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat rstudio-server.job.xxxxxxx
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You will find the user name, password and port information on HPC. The user name should be the same as you HPC user name and password will change anytime you submit this job.&lt;/p&gt;
&lt;p&gt;To access RStudio on your local computer, you need to bind your local port to the running HPC port. You need to open a new terminal to establish the SSH tunnel:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ssh -N -L 8787:[YOUR_PORT_INFORMATION] [HPC_USERNAME]@[HPC domain]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here the port information is from &lt;code&gt;rstudio-server.job.xxxxxxx&lt;/code&gt;. &lt;code&gt;[HPC_USERNAME]@[HPC domain]&lt;/code&gt; is the same with the regular ssh information to login in HPC. This command will forward HPC&amp;rsquo;s port to port 8787 on your local computer. After you open the SSH tunnel, you can access the RStudio from xcmsrocker via your own browser: http://localhost:8787&lt;/p&gt;
&lt;p&gt;Now you can enjoy your xcmsrocker image on HPC. Keep in mind that only the packages supporting parallel computing would get benefits from HPC resources. If the software doesn&amp;rsquo;t support parallel computing, you will need to modify their source code or it will be a waste of time to run them on HPC.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>I am looking for a faculty position</title>
      <link>https://yufree.cn/en/2021/09/23/i-am-looking-for-a-faculty-position/</link>
      <pubDate>Thu, 23 Sep 2021 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2021/09/23/i-am-looking-for-a-faculty-position/</guid>
      <description>&lt;p&gt;I am looking for a faculty position on earth. It&amp;rsquo;s always right to fit the position with your skill sets. However, after sending a dozen of applications with tailored resume or research statements, I decide to leave my cover letter online with my desired research interests.&lt;/p&gt;
&lt;p&gt;I am trained as an environmental analytical chemist from a state key laboratory under the supervision of Prof. Guibin Jiang in China. Then I worked with Prof. Janusz Pawliszyn in University of Waterloo, Canada for projects about &lt;em&gt;in vivo&lt;/em&gt; SPME based metabolomics data analysis as a PostDoc. After two years’ training, I joined Institute for Exposomic Research at Mount Sinai for environmental exposure related bioinformatics studies and worked with Dr. Lauren Petrick. I have published 37 peer reviewed journal papers with 9 first author or co-first author papers. My publications have more than 800 citations and a h-index of 17. I have two papers selected as journal cover (AC and ES&amp;amp;T letter) and one paper selected as ES&amp;amp;T Letter 2018 best paper. I authored three R packages on CRAN and developed shiny applications for my research. More details can be found in my CV.&lt;/p&gt;
&lt;p&gt;My research interests are the assessment of environmental exposures and impacts on humans through high resoltuion mass spectrometry based metabolomics analysis. I can apply &lt;em&gt;in vivo&lt;/em&gt; SPME technique to capture real-time changes in living organisms. I proposed the concept of “reactomics” based on paired mass distances to retrieve the changes of general chemical relationship in the samples and developed related software and database. Besides, I proposed a concept called “gatekeeper” to explain the influence of multiple exposures or exposome on health outcomes at molecular levels by metabolomics or other omics data. Those techniques and models can be used to understand the health impact of general environmental exposures. I can be either an experimental or bioinformatic scientist. However, I will treat myself as a mass spectrometry guy to solve various environmental related scientific problem by both dry and wet lab skills.&lt;/p&gt;
&lt;p&gt;I hope to continuously develop reactomics tools to investigate the influences of certain exposure and perform gatekeeper discovery for population-based exposure studies. I am planning to introduce machine learning into the biomarker reaction discovery based on reactomics and gatekeeper model for certain diseases. I am willing to collaborate with other researchers for multidisciplinary research projects.&lt;/p&gt;
&lt;p&gt;Feel free to &lt;a href=&#34;mailto:yufree@live.cn&#34;&gt;contact&lt;/a&gt; me if you need extra information. Thank you for your consideration.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Correlation coefficients cutoff to generate network in metabolomics</title>
      <link>https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/</link>
      <pubDate>Wed, 28 Jul 2021 00:00:00 +0000</pubDate>
      
      <guid>https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/</guid>
      <description>
&lt;script src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;One common research purpose in metabolomics is to check the relations among the metabolites. Correlation network is one of the most popular way to show such relations. However, such network will change with different selection of the cutoff of correlation coefficients.&lt;/p&gt;
&lt;p&gt;Let’s check some real world data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(pmd)
library(enviGCMS)
data(spmeinvivo)
# remove redundant peaks
newmet &amp;lt;- globalstd(spmeinvivo)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 75 retention time cluster found.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 369 paired masses found&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 5 unique within RT clusters high frequency PMD(s) used for further investigation.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The unique within RT clusters high frequency PMD(s) is(are)  28.03 21.98 44.03 17.03 18.01.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 719 isotopologue(s) related paired mass found.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 492 multi-charger(s) related paired mass found.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 8 retention group(s) have single peaks. 14 23 32 33 54 55 56 75&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 11 group(s) with multiple peaks while no isotope/paired relationship 4 5 7 8 11 41 42 49 68 72 73&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 9 group(s) with multiple peaks with isotope without paired relationship 2 9 22 26 52 62 64 66 70&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 4 group(s) with paired relationship without isotope 1 10 15 18&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 43 group(s) with paired relationship and isotope 3 6 12 13 16 17 19 20 21 24 25 27 28 29 30 31 34 35 36 37 38 39 40 43 44 45 46 47 48 50 51 53 57 58 59 60 61 63 65 67 69 71 74&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 291 std mass found.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metabolites &amp;lt;- getfilter(spmeinvivo,rowindex = newmet$stdmassindex)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Originally we have 1459 peaks. After removal of redundant peaks such as isotope, adducts and Neutral losses by globalstd algorithm, we have 291 peaks as the number of potential metabolites. To check their relations, we will calculate the paired correlation coefficients among their intensities.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metcor &amp;lt;- cor(t(metabolites$data))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s check the distribution of correlation coefficients:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hist(metcor)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Since correlation coefficients are also associated with a p value, we can also check the distribution of p values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cor.test.p &amp;lt;- function(x){
    FUN &amp;lt;- function(x, y) cor.test(x, y)[[&amp;quot;p.value&amp;quot;]]
    z &amp;lt;- outer(
      colnames(x), 
      colnames(x), 
      Vectorize(function(i,j) FUN(x[,i], x[,j]))
    )
    dimnames(z) &amp;lt;- list(colnames(x), colnames(x))
    z
}

pmat &amp;lt;- cor.test.p(t(metabolites$data))
hist(pmat)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sum(pmat&amp;lt;0.05)/length(pmat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.4145&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;41% original p values are less than 0.05. We can filter the correlation coefficients based on this rule.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metcor2 &amp;lt;- metcor[pmat&amp;lt;0.05]
hist(metcor2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;range(abs(metcor2))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.6664 1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we can find the cutoff is around +/-0.67. However, we didn’t perform FDR control. If we use BH method to correct the p value, we will have a different cutoff.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmat_adj &amp;lt;- p.adjust(pmat)
metcor3 &amp;lt;- metcor[pmat_adj&amp;lt;0.05]
range(abs(metcor3))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.9881 1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now the cutoff is 0.99. We can display the data as network:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metcor[pmat&amp;gt;=0.05] &amp;lt;- 0
library(igraph)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;igraph&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:stats&amp;#39;:
## 
##     decompose, spectrum&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:base&amp;#39;:
## 
##     union&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;net &amp;lt;- graph.adjacency(metcor,weighted=TRUE,diag=FALSE,mode = &amp;#39;undirected&amp;#39;)
plot(net,vertex.size=1,edge.width=1,vertex.label=&amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here it seems all metabolites are connected and FDR control will solve this issue.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metcor &amp;lt;- cor(t(metabolites$data))
metcor[pmat_adj&amp;gt;=0.05] &amp;lt;- 0
net &amp;lt;- graph.adjacency(metcor,weighted=TRUE,diag=FALSE,mode = &amp;#39;undirected&amp;#39;)
plot(net,vertex.size=1,edge.width=1,vertex.label=&amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here we will see the networks with few large clusters and lots of single metabolites without any association with each other.&lt;/p&gt;
&lt;p&gt;If we didn’t consider the p values, we can also check the networks with different cutoffs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;n &amp;lt;- c()
for (i in seq(0,1,0.1)) {
        metcor &amp;lt;- cor(t(metabolites$data))
        metcor[metcor&amp;lt;i] &amp;lt;- 0
        net &amp;lt;- graph.adjacency(metcor,weighted=TRUE,diag=FALSE,mode = &amp;#39;undirected&amp;#39;)
        # plot(net,vertex.size=1,edge.width=1,vertex.label=&amp;quot;&amp;quot;)
        cn &amp;lt;- components(net)
        # check the numbers of cluster
        n &amp;lt;- c(n,length(table(membership(cn))[table(membership(cn))&amp;gt;1]))
}
plot(seq(0,1,0.1),n,xlab=&amp;#39;cutoff&amp;#39;,ylab = &amp;#39;cluster number&amp;#39;,type = &amp;#39;l&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here we can see the cluster numbers will firstly increase and then decrease. Let’s check &lt;span class=&#34;math display&#34;&gt;\[0.8,1\]&lt;/span&gt; carefully.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;n &amp;lt;- c()
for (i in seq(0.8,1,0.001)) {
        metcor &amp;lt;- cor(t(metabolites$data))
        metcor[metcor&amp;lt;i] &amp;lt;- 0
        net &amp;lt;- graph.adjacency(metcor,weighted=TRUE,diag=FALSE,mode = &amp;#39;undirected&amp;#39;)
        # plot(net,vertex.size=1,edge.width=1,vertex.label=&amp;quot;&amp;quot;)
        cn &amp;lt;- components(net)
        # check the numbers of cluster
        n &amp;lt;- c(n,length(table(membership(cn))[table(membership(cn))&amp;gt;1]))
}
plot(seq(0.8,1,0.001),n,xlab=&amp;#39;cutoff&amp;#39;,ylab = &amp;#39;cluster number&amp;#39;,type = &amp;#39;l&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# display the cutoff
seq(0.8,1,0.001)[which.max(n)]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.988&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we find the max number of network clusters has a similar cutoff of p value cutoff with FDR control. However, the computation process is much faster. When the cutoff is small, all metabolites are connected. When the cutoff is large, few metabolites will be covered. In terms of physics, largest number of network clusters means the coverage of largest numbers of connected metabolites with largest clusters separations. I think this should be the fastest way to select cutoff from the real world data.&lt;/p&gt;
&lt;p&gt;Actually, I add a function called `getcf()` into `enet` package to automate find this cutoff of correlation network analysis. Here is the network for our demo data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;metcor &amp;lt;- cor(t(metabolites$data))
metcor[metcor&amp;lt;seq(0.8,1,0.001)[which.max(n)]] &amp;lt;- 0
net &amp;lt;- graph.adjacency(metcor,weighted=TRUE,diag=FALSE,mode = &amp;#39;undirected&amp;#39;)
plot(net,vertex.size=1,edge.width=1,vertex.label=&amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://yufree.cn/en/2021/07/28/correlation-coefficients-cutoff-to-generate-network-in-metabolomics/index_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The reason to avoid using p values or adjust p values of correlation test is not only the slow speed of computation, but also cutoff selection of p values or adjust p values is determined by the researcher instead of the data themselves. p value cutoff will not help us to find biological functional modules when all the metabolites are connected. In my opinion, each data sets can speak for itself by an automated cutoff selection process and I think the network cluster numbers can just take this job.&lt;/p&gt;
&lt;p&gt;PS. I actually use the same idea to generate PMD metabolites network, which can be treated as another relation among metabolites with chemical meanings.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
