Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons

  1. Bruce W. Birren1
  1. 1 Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, Massachusetts 02142, USA;
  2. 2 Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
  3. 3 Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas 77030, USA;
  4. 4 The Genome Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA;
  5. 5 Human Genomic Medicine, J. Craig Venter Institute, Rockville, Maryland 20850, USA;
  6. 6 Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA;
  7. 7 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA;
  8. 8 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado 80309, USA

    Abstract

    Bacterial diversity among environmental samples is commonly assessed with PCR-amplified 16S rRNA gene (16S) sequences. Perceived diversity, however, can be influenced by sample preparation, primer selection, and formation of chimeric 16S amplification products. Chimeras are hybrid products between multiple parent sequences that can be falsely interpreted as novel organisms, thus inflating apparent diversity. We developed a new chimera detection tool called Chimera Slayer (CS). CS detects chimeras with greater sensitivity than previous methods, performs well on short sequences such as those produced by the 454 Life Sciences (Roche) Genome Sequencer, and can scale to large data sets. By benchmarking CS performance against sequences derived from a controlled DNA mixture of known organisms and a simulated chimera set, we provide insights into the factors that affect chimera formation such as sequence abundance, the extent of similarity between 16S genes, and PCR conditions. Chimeras were found to reproducibly form among independent amplifications and contributed to false perceptions of sample diversity and the false identification of novel taxa, with less-abundant species exhibiting chimera rates exceeding 70%. Shotgun metagenomic sequences of our mock community appear to be devoid of 16S chimeras, supporting a role for shotgun metagenomics in validating novel organisms discovered in targeted sequence surveys.

    Footnotes

    • 9 Corresponding author.

      E-mail bhaas{at}broadinstitute.org.

    • [Supplemental material is available for this article. The sequence data from this study have been submitted to the NCBI Entrez Genome Project database (http://www.ncbi.nlm.nih.gov/genomeprj) under ID nos. 48465, 48471, 53501, and 60767. Software tools and data sets are freely available at http://microbiomeutil.sourceforge.net.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.112730.110.

    • Received July 11, 2010.
    • Accepted December 29, 2010.

    Freely available online through the Genome Research Open Access option.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server