Статья

CD-HIT: accelerated for clustering the next-generation sequencing data

LiMin FuCenter for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USABeifang NiuCenter for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USAZhengwei ZhuCenter for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USASitao WuCenter for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USAWeizhong LiCenter for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USA

2012en

ABI

Аннотация

SUMMARY: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. AVAILABILITY: http://cd-hit.org. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Перевод пока недоступен

Идентификаторы

DOI: 10.1093/bioinformatics/bts565

Цитирования и источники

Цитирований: 7Использованных источников: 0

Показатели — AkademScholar