Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Quantifiable predictive features define epitope-specific T cell receptor repertoires

Abstract

T cells are defined by a heterodimeric surface receptor, the T cell receptor (TCR), that mediates recognition of pathogen-associated epitopes through interactions with peptide and major histocompatibility complexes (pMHCs). TCRs are generated by genomic rearrangement of the germline TCR locus, a process termed V(D)J recombination, that has the potential to generate marked diversity of TCRs (estimated to range from 1015 (ref. 1) to as high as 1061 (ref. 2) possible receptors). Despite this potential diversity, TCRs from T cells that recognize the same pMHC epitope often share conserved sequence features, suggesting that it may be possible to predictively model epitope specificity. Here we report the in-depth characterization of ten epitope-specific TCR repertoires of CD8+ T cells from mice and humans, representing over 4,600 in-frame single-cell-derived TCRαβ sequence pairs from 110 subjects. We developed analytical tools to characterize these epitope-specific repertoires: a distance measure on the space of TCRs that permits clustering and visualization, a robust repertoire diversity metric that accommodates the low number of paired public receptors observed when compared to single-chain analyses, and a distance-based classifier that can assign previously unobserved TCRs to characterized repertoires with robust sensitivity and specificity. Our analyses demonstrate that each epitope-specific repertoire contains a clustered group of receptors that share core sequence similarities, together with a dispersed set of diverse ‘outlier’ sequences. By identifying shared motifs in core sequences, we were able to highlight key conserved residues driving essential elements of TCR recognition. These analyses provide insights into the generalizable, underlying features of epitope-specific repertoires and adaptive immune recognition.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: V and J gene segment usage and covariation in epitope-specific responses.
Figure 2: TCRdist analysis of the M45 repertoire identifies clusters of related receptors.
Figure 3: Enriched CDR3 sequence motifs define key features of epitope specificity.
Figure 4: Quantifying the defining features of epitope-specific populations.

Similar content being viewed by others

References

  1. Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988)

    Article  CAS  ADS  PubMed  Google Scholar 

  2. Mora, T. & Walczak, A. M. Quantifying lymphocyte receptor diversity. bioRxiv 046870 (2016)

  3. Giraud, M. et al. Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing. BMC Genomics 15, 409 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  4. Alamyar, E., Giudicelli, V., Li, S. & Duroux, P. IMGT/HighV-QUEST: the IMGT® web portal for immunoglobulin (IG) or antibody and T cell receptor (TR) analysis from NGS high throughput and deep sequencing. Immunomethods 882, 569–604 (2012)

    CAS  Google Scholar 

  5. Bolotin, D. A. et al. MiTCR: software for T-cell receptor sequencing data analysis. Nat. Methods 10, 813–814 (2013)

    Article  CAS  PubMed  Google Scholar 

  6. Gerritsen, B., Pandit, A., Andeweg, A. C. & de Boer, R. J. RTCR: a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data. Bioinformatics 32, 3098–3106 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Turner, S. J., Doherty, P. C., McCluskey, J. & Rossjohn, J. Structural determinants of T-cell receptor bias in immunity. Nat. Rev. Immunol. 6, 883–894 (2006)

    Article  CAS  PubMed  Google Scholar 

  8. Li, H. et al. Recombinatorial biases and convergent recombination determine interindividual TCRβ sharing in murine thymocytes. J. Immunol. 189, 2404–2413 (2012)

    Article  CAS  PubMed  Google Scholar 

  9. Venturi, V. et al. Sharing of T cell receptors in antigen-specific responses is driven by convergent recombination. Proc. Natl Acad. Sci. USA 103, 18691–18696 (2006)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  10. Genolet, R., Stevenson, B. J., Farinelli, L., Osterås, M. & Luescher, I. F. Highly diverse TCRα chain repertoire of pre-immune CD8+ T cells reveals new insights in gene recombination. EMBO J. 31, 1666–1678 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ruggiero, E. et al. High-resolution analysis of the human T-cell receptor repertoire. Nat. Commun. 6, 8081 (2015)

    Article  ADS  PubMed  Google Scholar 

  12. Ndifon, W. et al. Chromatin conformation governs T-cell receptor Jβ gene segment usage. Proc. Natl Acad. Sci. USA 109, 15865–15870 (2012)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  13. Howie, B. et al. High-throughput pairing of T cell receptor α and β sequences. Sci. Transl. Med. 7, 301ra131 (2015)

    Article  PubMed  Google Scholar 

  14. Cinelli, M. et al. Feature selection using a one dimensional naive Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires. Bioinformatics 33, 951–955 (2017)

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Thomas, N. et al. Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence. Bioinformatics 30, 3181–3188 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Day, E. B. et al. Structural basis for enabling T-cell receptor diversity within biased virus-specific CD8+ T-cell responses. Proc. Natl Acad. Sci. USA 108, 9536–9541 (2011)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  17. Miles, J. J. et al. Genetic and structural basis for selection of a ubiquitous T cell receptor deployed in Epstein–Barr virus. PLoS Pathog. 6, e1001198 (2011)

    Article  Google Scholar 

  18. Stewart-Jones, G. B. E., McMichael, A. J., Bell, J. I., Stuart, D. I. & Jones, E. Y. A structural basis for immunodominant human T cell receptor recognition. Nat. Immunol. 4, 657–663 (2003)

    Article  CAS  PubMed  Google Scholar 

  19. Ishizuka, J. et al. The structural dynamics and energetics of an immunodominant T cell receptor are programmed by its Vβ domain. Immunity 28, 171–182 (2008)

    Article  CAS  MathSciNet  PubMed  Google Scholar 

  20. La Gruta, N. L. et al. Epitope-specific TCRβ repertoire diversity imparts no functional advantage on the CD8+ T cell response to cognate viral peptides. Proc. Natl Acad. Sci. USA 105, 2034–2039 (2008)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  21. Rudd, B. D., Venturi, V., Davenport, M. P. & Nikolich-Zugich, J. Evolution of the antigen-specific CD8+ TCR repertoire across the life span: evidence for clonal homogenization of the old TCR repertoire. J. Immunol. 186, 2056–2064 (2011)

    Article  CAS  PubMed  Google Scholar 

  22. Venturi, V., Kedzierska, K., Turner, S. J., Doherty, P. C. & Davenport, M. P. Methods for comparing the diversity of samples of the T cell receptor repertoire. J. Immunol. Methods 321, 182–195 (2007)

    Article  CAS  PubMed  Google Scholar 

  23. Li, B. et al. Landscape of tumor-infiltrating T cell repertoire of human cancers. Nat. Genet. 48, 725–732 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Parkhurst, M. R. et al. Isolation of T cell receptors specifically reactive with mutated tumor associated antigens from tumor infiltrating lymphocytes based on CD137 expression. Clin. Cancer Res. 23, 2491–2505 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  25. Pasetto, A. et al. Tumor- and neoantigen-reactive T-cell receptors can be identified based on their frequency in fresh tumor. Cancer Immunol. Res. 4, 734–743 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tran, E. et al. Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. Science 344, 641–645 (2014)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  27. Dash, P. et al. Paired analysis of TCRα and TCRβ chains at the single-cell level in mice. J. Clin. Invest. 121, 288–295 (2011)

    Article  CAS  PubMed  Google Scholar 

  28. Wang, G. C., Dash, P., McCullers, J. A., Doherty, P. C. & Thomas, P. G. T cell receptor αβ diversity inversely correlates with pathogen-specific antibody levels in human cytomegalovirus infection. Sci. Transl. Med. 4, 128ra42 (2012)

    PubMed  PubMed Central  Google Scholar 

  29. Dash, P., Wang, G. C. & Thomas, P. G. Single-cell analysis of T-cell receptor αβ repertoire. Methods Mol. Biol. 1343, 181–197 (2015)

    Article  CAS  PubMed  Google Scholar 

  30. Guo, X.-Z. J. et al. Rapid cloning, expression, and functional characterization of paired αβ and γδ T-cell receptor chains from single-cell analysis. Mol. Ther. Methods Clin. Dev. 3, 15054 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  31. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  32. Lefranc, M.-P. et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 37, D1006–D1012 (2009)

    Article  CAS  PubMed  Google Scholar 

  33. Putintseva, E. V. et al. Mother and child T cell receptor repertoires: deep profiling study. Front. Immunol. 4, 463 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  34. Kullback, S. & Leibler, R. A. On Information and Sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  35. Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37, 145–151 (1991)

    Article  MathSciNet  Google Scholar 

  36. Vinh, N. X., Julien, E. & James, B. Information theoretic measures for clusterings comparison. in Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09 (2009). doi:10.1145/1553374.1553511

  37. Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  38. Rokach, L., Lior, R. & Oded, M. in Data Mining and Knowledge Discovery Handbook 321–352 (2005)

  39. Cukalac, T. et al. Paired TCRαβ analysis of virus-specific CD8+ T cells exposes diversity in a previously defined ‘narrow’ repertoire. Immunol. Cell Biol. 93, 804–814 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank the St Jude Children’s Research Hospital Animal Resource Center’s staff for their support and excellent animal care. We thank G. Lennon for the help with single-cell sorting. We thank the Hartwell Center at St Jude for sequencing support. We also thank M. Morris, L. McLaren, T. H. Oguin III, W. Awad, A. Zamora, D. Boyd, X. Guo, S. Valkenburg, E. Grant, N. Bird and N. Mifsud for their help in conducting experiments and preparation of the manuscript. The work was supported by NIH grant AI107625 and ALSAC (to P.G.T.), FHCRC internal development funding to P.B., and an NHMRC Program Grant (1071916) to K.K. and N.L.L. N.L.L. is the recipient of a Sylvia and Charles Viertel Senior Medical Research Fellowship. E.B.C. is an NHMRC Peter Doherty Fellow and K.K. is an NHMRC SRF Level B Fellow. G.C.W. was the recipient of National Institute on Aging (NIA) K23 AG033113, NIA P30 AG021334, John A. Hartford Foundation’s Center of Excellence in Geriatric Medicine Scholars Award, and Johns Hopkins Biology of Healthy Aging Program.

Author information

Authors and Affiliations

Authors

Contributions

P.D., A.G., T.H., P.B. and P.G.T. wrote the manuscript and designed figures. P.D., G.C.W., S.S. and P.G.T. designed experiments. P.D., G.C.W., A.S. and S.S. conducted experiments. P.D., G.C.W., S.S. and A.S. acquired data. P.D., G.C.W., S.S., A.S., J.C.C., B.C., T.H.O.N., K.K., N.L.L., P.B. and P.G.T. analysed data. P.D., A.G., T.H., P.B., P.G.T. interpreted data. P.D., A.G., T.H., A.S., P.B., G.C.W., K.K., N.L.L., P.G.T. and J.C.C. edited the manuscript. All authors approved final manuscript.

Corresponding authors

Correspondence to Philip Bradley or Paul G. Thomas.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks B. Chain and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 CDR3 region characteristics of 10 epitope-specific TCR repertoires.

a, Paired TCR sequences derived from epitope-specific CD8+ T cells were analysed for CDR3 length, charge, hydrophobicity, and inferred number of junctional nucleotide insertions for both single and paired chains as shown in the histograms. Different epitopes are colour-coded (described in the legend). b, Correlation between CDR3αβ and antigenic peptides for charge, hydrophobicity, length, and N-insertions observed in all 10 epitopes. A summary of the number of subjects, total number of TCR sequences, and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Source data

Extended Data Figure 2 V and J gene segment usage and covariation in epitope-specific responses.

Gene segment usage and gene–gene pairing landscapes are illustrated graphically using four vertical stacks (one for each V and J segment) connected by curved segments with thickness proportional to the number of TCRs with the respective gene pairing (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. Clonally expanded TCRs were reduced to a single data point for this analysis. The number of clones is indicated to the left of each panel. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows, with each successive arrowhead corresponding to an additional twofold deviation (for example, one arrowhead = twofold enrichment, two arrowheads = fourfold enrichment).

Extended Data Figure 3 Schematic overview of the TCRdist calculation.

Each of the two TCRs being compared is first mapped to the amino acid sequence of its CDR loops (CDR1, CDR2, and CDR3 as well as an additional variable loop here labelled ‘CDR2.5’), as indicated by the black arrows leading from the coloured loop regions in the receptor structures to the corresponding amino acid sequences in the middle of the diagram. These CDR sequences are aligned based on the IMGT reference32 multiple sequence alignments, and a distance score (‘AAdist’) is computed for each position in the alignment using the BLOSUM62 similarity matrix according to the formula given in the box at the bottom left. The AAdist scores are weighted as shown in the ‘weight’ row (thereby increasing the contribution of the CDR3 regions) and summed to produce the final TCRdist score (shown at the right).

Extended Data Figure 4 Two-dimensional projections of mouse epitope-specific TCR repertoires.

Epitope-specific TCR landscapes were projected into two dimensions (2D) using kernel PCA analysis applied to the TCRdist distance matrix: TCRs with small TCRdist values tend to project to nearby points in 2D. The same 2D projection is shown in the four panels of each row, coloured by Vα, Jα, Vβ and Jβ gene segment usage (left to right, respectively). The colours are based on gene frequency in the projected repertoire and follow the same sequence used throughout the manuscript: in decreasing order, 1, red; 2, green; 3, blue; 4, cyan; 5, magenta; 6, black; followed by assorted colours for rare frequencies. A summary of number of subjects, total number of TCR sequences and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Extended Data Figure 5 Two-dimensional projections and clustering dendrograms of human epitope-specific TCR repertoires.

a, Kernel PCA projections for the three human epitopes, coloured as in Extended Data Fig. 4. b, Average-linkage dendrograms of TCR clusterings for the human repertoires. Each clustering was generated using a fixed-distance-threshold algorithm and coloured by generation probability (red, highest; blue, lowest probability of ease of TCR recombination). The TCR logos for selected receptor subsets (corresponding to the branches of the dendrogram enclosed in dashed boxes) are shown, labelled by cluster size both to the left of each logo and to the right of the corresponding branches. Each TCR logo depicts the V- and J-gene frequencies, the CDR3 amino acid sequence, and the inferred rearrangement structure of the grouped receptors (coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions; details in Methods). A summary of number of subjects, total number of TCR sequences and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Extended Data Figure 6 Clustering dendrograms of mouse epitope-specific TCR repertoires.

Each mouse epitope-specific TCR repertoire not depicted in main text Fig. 2 was clustered using a fixed-distance-threshold clustering algorithm and represented as a dendrogram coloured by generation probability (red, highest; blue, lowest probability of ease of TCR recombination), with TCR logos for selected receptor subsets (corresponding to the branches of the dendrogram enclosed in dashed boxes), labelled by cluster size both to the left of each logo and to the right of the corresponding branches. Each TCR logo depicts the V- and J-gene frequencies, the CDR3 amino acid sequence, and the inferred rearrangement structure of the grouped receptors (coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions; details in Methods). A summary of number of subjects, total number of TCR sequences and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Extended Data Figure 7 TCR logo representations of CDR3 α and β sequence motifs.

The results of our CDR3 motif discovery algorithm were visualized using a TCR logo that summarizes V and J usage, CDR3 amino acid enrichment, and inferred rearrangement structures. The motif sequence logo is shown at full height (top) and scaled (bottom) by per-column relative entropy to background frequencies derived from TCRs with matching gene-segment composition in order to highlight motif positions under selection. The motif chi-squared score (see Methods) and the fraction of the repertoire matched are given below the J-gene logo. A summary of number of subjects, total number of TCR sequences and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Extended Data Figure 8 Quantifying the defining features of epitope-specific populations.

a, TCRdiv diversity measures. b, The area under the ROC curves (AUROC), a standard measure of classification success. c, Correlations between the discrimination AUROC and the TCRdiv diversity measure at single and paired chain level. d, Correlation between repertoire sampling density and generation probability. Nearest-neighbours sampling metric for all TCRs in the dataset (x axis) is plotted against an estimated generation probability (y axis) based on a simple model of the rearrangement process that accounts for distance from germ line and convergent recombination. The distributions of each measure were normalized (ranked by percentile) within each dataset so that global differences between repertoires do not influence the correlation. e, Quantifying the defining features of human epitope-specific responses. Smoothed, nearest-neighbour distance distributions with respect to the labelled repertoire are plotted in the left three columns for epitope-specific TCRs (red curves) and randomly selected background TCRs (blue curves); TCRdist distances were calculated over the α chain (column 1), the β chain (column 2), or the full receptor (column 3). Plotted in columns 4–6 are receiver operating characteristic (ROC) curves assessing the performance of neighbour-distance as a TCR classifier, comparing sensitivity and specificity in differentiating epitope-specific receptors from randomly selected background receptors (blue ROC curves). Analyses for both single and paired chains are shown, as indicated in the plot labels. A summary of number of subjects, total number of TCR sequences and unique TCR clones analysed for each epitope are shown in Extended Data Table 1.

Source data

Extended Data Figure 9 Specificity and avidity of TCRs of the dispersed region of the TCRdist dendrograms.

a, Representative flow plots showing gating strategies of tetramer-positive CD8 T cells from influenza infected lungs. b, Cloning and expression of clustered and dispersed receptors from the indicated epitopes stained with specific tetramer versus control levels. Representative TCRs from clustered and dispersed regions of the TCRdist dendrogram were cloned, expressed, and tested for binding against specific tetramers. Binding of two non-clustered TCRs from the NP and PB1 epitopes and a TCR from the clustered region of the PB1 epitope is shown. c, The distribution of the tested TCRs (numbered 1–5 corresponding to left to right occurrence in b) on a NN-distance plot and d, their V-J usage and CDR3 sequences with NN-distance score are shown. e, Analysis of the mean fluorescence intensities (MFI) of the clustered and dispersed (separated by visual threshold of 135 NN-distance score) group of receptors shows no consistent segregation of the avidity. Mean and standard error of mean are shown. f, PB1-specific TCRs derived from cells sorted by low, intermediate and high gating show overlapping distribution of NN-distance scores (n = 23 (low), 18 (intermediate), 23 (high) cells).

Source data

Extended Data Table 1 TCR repertoires

Supplementary information

Supplementary Information

This file contains Supplementary Text and References.

PowerPoint slides

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dash, P., Fiore-Gartland, A., Hertz, T. et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547, 89–93 (2017). https://doi.org/10.1038/nature22383

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature22383

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing