Статья

NCBI prokaryotic genome annotation pipeline

Tatiana TatusovaNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAMichael DiCuccioNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAAzat BadretdinNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAVyacheslav ChetverninNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAEric P. NawrockiNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USALeonid ZaslavskyNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAAlexandre LomsadzeWallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USAKim D. PruittNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USAMark BorodovskyWallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA School of Computational Science and Engineering, Georgia Tech, Atlanta, GA 30332, USA [email protected]James OstellNational Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA

2016en

ABI

Аннотация

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Перевод пока недоступен

Идентификаторы

DOI: 10.1093/nar/gkw569

Цитирования и источники

Цитирований: 3Использованных источников: 0

Показатели — AkademScholar