Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

PyClone: statistical inference of clonal population structure in cancer

Abstract

We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of clustering performance for the mixture of normal-tissue data sets.
Figure 2: Joint analysis of multiple samples from high-grade serous ovarian cancer 2.

Similar content being viewed by others

References

  1. Nowell, P.C. Science 194, 23–28 (1976).

    Article  CAS  PubMed  Google Scholar 

  2. Aparicio, S. & Caldas, C. N. Engl. J. Med. 368, 842–851 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Greaves, M. & Maley, C.C. Nature 481, 306–313 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Shah, S.P. et al. Nature 486, 395–399 (2012).

    Article  CAS  PubMed  Google Scholar 

  5. Ding, L. et al. Nature 481, 506–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Nik-Zainal, S. et al. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Carter, S.L. et al. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Govindan, R. et al. Cell 150, 1121–1134 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Shah, S.P. et al. Nature 461, 809–813 (2009).

    Article  CAS  PubMed  Google Scholar 

  10. Gerlinger, M. et al. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. The 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).

  12. Harismendy, O. et al. Genome Biol. 12, R124 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rosenberg, A. & Hirschberg, J. in Proc. 2007 Joint Conf. Empir. Methods Natural Lang. Process. Comput. Natural Lang. Learn. (EMNLP-CoNLL) Vol. 410, 420 (2007).

    Google Scholar 

  14. Bashashati, A. et al. J. Pathol. 231, 21–34 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Forshew, T. et al. Sci. Transl. Med. 4, 136ra68 (2012).

    Article  PubMed  Google Scholar 

  16. Dawson, S.J. et al. N. Engl. J. Med. 368, 1199–1209 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. Sottoriva, A. et al. Proc. Natl. Acad. Sci. USA 110, 4009–4014 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fritsch, A. & Ickstadt, K. Bayesian Anal. 4, 367–392 (2009).

    Article  Google Scholar 

  19. Ng, S.B. et al. Nature 461, 272–276 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Van Loo, P. et al. Proc. Natl. Acad. Sci. USA 107, 16910–16915 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Greenman, C.D. et al. Biostatistics 11, 164–175 (2010).

    Article  PubMed  Google Scholar 

  22. Yau, C. et al. Genome Biol. 11, R92 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Untergasser, A. et al. Nucleic Acids Res. 40, e115 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Li, H. & Durbin, R. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work is funded by Canadian Institutes for Health Research (CIHR), Genome Canada, Genome British Columbia, Canadian Cancer Society Research Institute and Canadian Breast Cancer Foundation grants to S.P.S. and S.A. S.P.S. is supported by the Michael Smith Foundation for Health Research and is the Canada Research Chair (CRC) for Computational Cancer Genomics. S.A. is the CRC for Molecular Oncology. A.R. is supported by a CIHR Banting scholarship.

Author information

Authors and Affiliations

Authors

Contributions

Project conception and oversight: S.P.S., S.A., A.R.; method development: A.R., A.B.-C., S.P.S.; implementation and benchmarking: A.R.; manuscript writing and editing, study design and execution: A.R., A.B.C., S.P.S., S.A.; single-cell sequencing: J.K., D.Y., A.W., E.L., J.B.; data analysis and interpretation: G.H.

Corresponding author

Correspondence to Sohrab P Shah.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Results, Supplementary Discussion and Supplementary Note (PDF 5370 kb)

Supplementary Table 1

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 2. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLS 50 kb)

Supplementary Table 2

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 1. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLSX 40 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2883

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer