Статья

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USACarrie Davis1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAFelix Schlesinger1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAJörg Drenkow1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAChris Zaleski1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USASonali Jha1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAPhilippe Batut1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAMark Chaisson1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USAT Gingeras1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA and 2Pacific Biosciences, Menlo Park, CA, USA

2012en

ABI

Аннотация

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Перевод пока недоступен

Идентификаторы

DOI: 10.1093/bioinformatics/bts635

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar