Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Bias, robustness and scalability in single-cell differential expression analysis

Abstract

Many methods have been used to determine differential gene expression from single-cell RNA (scRNA)-seq data. We evaluated 36 approaches using experimental and synthetic data and found considerable differences in the number and characteristics of the genes that are called differentially expressed. Prefiltering of lowly expressed genes has important effects, particularly for some of the methods developed for bulk RNA-seq data analysis. However, we found that bulk RNA-seq analysis methods do not generally perform worse than those developed specifically for scRNA-seq. We also present conquer, a repository of consistently processed, analysis-ready public scRNA-seq data sets that is aimed at simplifying method evaluation and reanalysis of published results. Each data set provides abundance estimates for both genes and transcripts, as well as quality control and exploratory analysis reports.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Type I error control across several instances from eight single-cell null data sets.
Figure 2: Characteristics of genes falsely called significant by DE methods.
Figure 3: Average similarities between gene rankings obtained by the evaluated DE methods.
Figure 4: Differential expression detection performance, summarized across all instances of the three simulated data sets.
Figure 5: Summary of DE method performance across all major evaluation criteria.

Similar content being viewed by others

References

  1. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    CAS  PubMed  Google Scholar 

  2. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    CAS  PubMed  Google Scholar 

  3. Klein, A.M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Macosko, E.Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Zheng, G.X.Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    PubMed  PubMed Central  Google Scholar 

  7. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  PubMed  Google Scholar 

  8. Law, C.W., Chen, Y., Shi, W. & Smyth, G.K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).

    PubMed  PubMed Central  Google Scholar 

  9. Miao, Z. & Zhang, X. Differential expression analyses for single-cell RNA-Seq: old questions on new data. Quant. Biol. 4, 243–260 (2016).

    CAS  Google Scholar 

  10. Jaakkola, M.K., Seyednasrollah, F., Mehmood, A. & Elo, L.L. Comparison of methods to detect differentially expressed genes between single-cell populations. Brief. Bioinform. 18, 735–743 (2017).

    CAS  PubMed  Google Scholar 

  11. Lun, A.T.L. & Marioni, J.C. Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data. Biostatistics 18, 451–464 (2017).

    PubMed  PubMed Central  Google Scholar 

  12. Vallejos, C.A., Richardson, S. & Marioni, J.C. Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol. 17, 70 (2016).

    PubMed  PubMed Central  Google Scholar 

  13. Korthauer, K.D. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016).

    PubMed  PubMed Central  Google Scholar 

  14. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Lun, A.T.L., Chen, Y. & Smyth, G.K. It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edger. in Statistical Genomics (eds. Mathé, E. & Davis, S.) 391–416 (Springer New York, 2016).

  16. Paulson, J.N., Stine, O.C., Bravo, H.C. & Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10, 1200–1202 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Bourgon, R., Gentleman, R. & Huber, W. Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. USA 107, 9546–9551 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Ignatiadis, N., Klaus, B., Zaugg, J.B. & Huber, W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat. Methods 13, 577–580 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Elo, L.L., Filén, S., Lahesmaa, R. & Aittokallio, T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 423–431 (2008).

    CAS  PubMed  Google Scholar 

  21. Seyednasrollah, F., Rantanen, K., Jaakkola, P. & Elo, L.L. ROTS: reproducible RNA-seq biomarker detector-prognostic markers for clear cell renal cell cancer. Nucleic Acids Res. 44, e1 (2016).

    PubMed  Google Scholar 

  22. Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Qiu, X. et al. Single-cell mRNA quantification and differential analysis with Census. Nat. Methods 14, 309–315 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Sengupta, D., Rayan, N.A., Lim, M., Lim, B. & Prabhakar, S. Fast, scalable and accurate differential expression analysis for single cells. Preprint available at https://www.biorxiv.org/content/early/2016/04/22/049734 (2016).

  26. Li, J. & Tibshirani, R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat. Methods Med. Res. 22, 519–536 (2013).

    PubMed  Google Scholar 

  27. Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).

    PubMed  PubMed Central  Google Scholar 

  28. Delmans, M. & Hemberg, M. Discrete distributional differential expression (D3E)—a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinformatics 17, 110 (2016).

    PubMed  PubMed Central  Google Scholar 

  29. Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, e3 (2004).

    Google Scholar 

  30. Miao, Z. & Zhang, X. DEsingle: a new method for single-cell differentially expressed genes detection and classification. Preprint available at https://www.biorxiv.org/content/early/2017/09/08/173997 (2017).

  31. Vu, T.N. et al. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128–2135 (2016).

    CAS  PubMed  Google Scholar 

  32. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Aken, B.L. et al. The Ensembl gene annotation system. Database 2016, baw093 (2016).

    PubMed  PubMed Central  Google Scholar 

  34. Vieth, B., Ziegenhain, C., Parekh, S., Enard, W. & Hellmann, I. powsimR: power analysis for bulk and single cell RNA-seq experiments. Preprint available at https://www.biorxiv.org/content/early/2017/06/26/117150 (2017).

  35. Soneson, C. & Robinson, M.D. Towards unified quality verification of synthetic count data with countsimQC. Bioinformatics https://dx.doi.org/10.1093/bioinformatics/btx631 (2017).

  36. Soneson, C., Love, M.I. & Robinson, M.D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 4, 1521 (2015).

    PubMed  Google Scholar 

  37. Usoskin, D. et al. Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 18, 145–153 (2015).

    CAS  PubMed  Google Scholar 

  38. McCarthy, D.J., Chen, Y. & Smyth, G.K. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Chen, Y., Lun, A.T.L. & Smyth, G.K. Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR. in Statistical Analysis of Next Generation Sequencing Data (eds. Datta, S. & Nettleton, D.) 51–74 (Springer International Publishing, 2014).

  40. Zhou, X., Lindsay, H. & Robinson, M.D. Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res. 42, e91 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    PubMed  PubMed Central  Google Scholar 

  42. Lun, A.T.L., Bach, K. & Marioni, J.C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

    PubMed  Google Scholar 

  43. Wilcoxon, F. Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945).

    Google Scholar 

  44. McDavid, A. et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461–467 (2013).

    CAS  PubMed  Google Scholar 

  45. Welch, B.L. The generalisation of student's problems when several different population variances are involved. Biometrika 34, 28–35 (1947).

    CAS  PubMed  Google Scholar 

  46. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005).

    CAS  PubMed  Google Scholar 

  47. Svensson, V., Vento-Tormo, R. & Teichmann, S.A. Moore's law in single cell transcriptomics. Preprint available at https://arxiv.org/abs/1704.01379v1 (2017).

  48. Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2016).

  51. Soneson, C. & Robinson, M.D. iCOBRA: open, reproducible, standardized and live method benchmarking. Nat. Methods 13, 283 (2016).

    PubMed  Google Scholar 

  52. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York, 2009).

Download references

Acknowledgements

The authors acknowledge M. Love and V. Svensson for helpful online instructions regarding automated download of raw data from ENA. This study was supported by the Forschungskredit of the University of Zurich, grant no. FK-16-107 to C.S.

Author information

Authors and Affiliations

Authors

Contributions

C.S. and M.D.R. designed analyses and wrote the manuscript. C.S. performed analyses. Both authors read and approved the final manuscript.

Corresponding authors

Correspondence to Charlotte Soneson or Mark D Robinson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–32 and Supplementary Tables 1–3 (PDF 15651 kb)

Life Sciences Reporting Summary (PDF 213 kb)

Supplementary Data

countsimQC reports, illustrating the similarity between each simulated dataset and the respective underlying real data set (ZIP 18523 kb)

Supplementary Software

Snapshot (at time of publication) of the two GitHub repositories containing the code used to build the conquer database (https://github.com/markrobinsonuzh/conquer) and to perform the method comparison (https://github.com/csoneson/conquer_comparison) (ZIP 4213 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soneson, C., Robinson, M. Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15, 255–261 (2018). https://doi.org/10.1038/nmeth.4612

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4612

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing