Статья

NPOmix: A machine learning classifier to connect mass spectrometry fragmentation data to biosynthetic gene clusters

Tiago LeãoCenter for Nuclear Energy in Agriculture, University of São Paulo , Piracicaba 13400-970, SP , BrazilMingxun WangCenter for Computational Mass Spectrometry, University of California San Diego , La Jolla, CA 92093 , USARicardo SilvaNPPNS, Physic and Chemistry Department, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo , Ribeirão Preto 14040-900 , BrazilAlexey GurevichCenter for Algorithmic Biotechnology, St. Petersburg State University , St Petersburg 199004 , RussiaAnelize BauermeisterCollaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , La Jolla, CA 92093 , USAPaulo Wender Portal GomesCollaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , La Jolla, CA 92093 , USAAsker BrejnrodCollaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , La Jolla, CA 92093 , USAEvgenia GlukhovCenter for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego , La Jolla, CA 92093 , USAAllegra T. AronCollaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , La Jolla, CA 92093 , USAJoris J. R. LouwenBioinformatics Group, Wageningen University , 6708 PB Wageningen , The NetherlandsHyun Woo KimCollege of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University , Gyeonggi-do 10326 , KoreaRaphael ReherInstitute of Pharmaceutical Biology and Biotechnology, University of Marburg , 35043 Marburg , GermanyMarli Fátima FioreCenter for Nuclear Energy in Agriculture, University of São Paulo , Piracicaba 13400-970, SP , BrazilJustin J. J. van der HooftBioinformatics Group, Wageningen University , 6708 PB Wageningen , The NetherlandsLena GerwickCenter for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego , La Jolla, CA 92093 , USAWilliam H. GerwickCenter for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego , La Jolla, CA 92093 , USANuno BandeiraCenter for Computational Mass Spectrometry, University of California San Diego , La Jolla, CA 92093 , USAPieter C. DorresteinCenter for Microbiome Innovation, University of California San Diego , La Jolla, CA 92093 , USA

2022en

ABI

Аннотация

Abstract Microbial specialized metabolites are an important source of and inspiration for many pharmaceuticals, biotechnological products and play key roles in ecological processes. Untargeted metabolomics using liquid chromatography coupled with tandem mass spectrometry is an efficient technique to access metabolites from fractions and even environmental crude extracts. Nevertheless, metabolomics is limited in predicting structures or bioactivities for cryptic metabolites. Efficiently linking the biosynthetic potential inferred from (meta)genomics to the specialized metabolome would accelerate drug discovery programs by allowing metabolomics to make use of genetic predictions. Here, we present a k-nearest neighbor classifier to systematically connect mass spectrometry fragmentation spectra to their corresponding biosynthetic gene clusters (independent of their chemical class). Our new pattern-based genome mining pipeline links biosynthetic genes to metabolites that they encode for, as detected via mass spectrometry from bacterial cultures or environmental microbiomes. Using paired datasets that include validated genes-mass spectral links from the Paired Omics Data Platform, we demonstrate this approach by automatically linking 18 previously known mass spectra (17 for which the biosynthesis gene clusters can be found at the MIBiG database plus palmyramide A) to their corresponding previously experimentally validated biosynthetic genes (e.g., via nuclear magnetic resonance or genetic engineering). We illustrated a computational example of how to use our Natural Products Mixed Omics (NPOmix) tool for siderophore mining that can be reproduced by the users. We conclude that NPOmix minimizes the need for culturing (it worked well on microbiomes) and facilitates specialized metabolite prioritization based on integrative omics mining.

Перевод пока недоступен

Идентификаторы

DOI: 10.1093/pnasnexus/pgac257

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar