Article Text

Download PDFPDF

159 Refining TCR clonotype identification with long-read sequencing technique
  1. Nicholas Da Zhi Ang1,
  2. Wei Lin Tang1,
  3. Javian Zheng Han Ng1,2,
  4. Wendy Lee1,
  5. Solomonraj Wilson3,
  6. Ser Mei Koh1,
  7. Menaka Priyadharsani Rajapakse1,
  8. Xiaoyu Liu1,4,
  9. Bernett Lee5,
  10. Olaf Rotzchke1 and
  11. Mai Chan Lau1,6
  1. 1Singapore Immunology Network (SIgN), Agency of Science, Technology and Research (A*STAR), Singapore
  2. 2National University of Singapore (NUS), Singapore
  3. 3Agency of Science, Technology and Research (A*STAR), Singapore
  4. 4Nanyang Technological University (NTU), Singapore
  5. 5Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
  6. 6Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore
  • Journal for ImmunoTherapy of Cancer (JITC) preprint. The copyright holder for this preprint are the authors/funders, who have granted JITC permission to display the preprint. All rights reserved. No reuse allowed without permission.

Abstract

Background Beyond conventional short-read Next-Generation Sequencing (NGS), long-read sequencing technology, like Oxford Nanopore Technologies (ONT),1 enhance coverage and resolution of genomics fragments, crucial for detecting T Cell Receptors (TCRs) known for their variability from VDJ recombination.2 Short-read NGS sequences from 3’ or 5’ end of the cDNA template, for 3’ short-read sequencing, adapters/barcodes near constant regions hinder capturing variable regions. Long-read sequencing captures entire TCR regions, providing better resolution on the complete CDR3.3 However, long-reads have higher error rates,4 requiring a robust bioinformatic pipeline.5 6 We aim to improve the accuracy and reliability of TCR repertoire reconstruction, enhancing clonotype identification for T-cell vaccine development.

Methods Human PBMCs were processed for full-length TCR libraries using Chromium Next GEM Single Cell 5’ v2 (10xGenomics) and sequenced using illumina HiSeq X, producing the single-cell TCR (scTCR) dataset (figure 1A). A separate aliquot of full-length TCR cDNA was sequenced using Oxford Nanopore Technologies (ONT) Ligation Sequencing Kit V14 and PromethION Flow Cell (figure 1A), with dorado Duplex base-calling, producing the long-read scTCR dataset.

Long-read scTCR cell barcodes were identified following the 5’ adaptor (figure 2A). In the scTCR dataset, with adaptors removed, barcodes were located using TSO sequence. Extracted barcodes were validated against known whitelist,7 visualized using VennDiagram R package. Reads segregated by unique barcodes for identifying cell-specific TCR alpha/beta chains. Hierarchical clustering was performed on Mash distances8 using Ward’s Linkage. TCR alpha/beta clusters were verified using MiXCR-align.9 Sequencing errors were assessed against MiXCR reference using minimap2,10 and visualized on IGV.

Results We introduce a position-based cell barcode identification approach (figure 2A), to aid cell-specific TCR clonotype determination. Analysis shows that few cell barcodes from the long-read scTCR dataset matched with 10x whitelist, while majority did not (figure 2B). Over half of the scTCR dataset’s barcodes also failed to match, suggesting that barcode correction is needed. Using reads from a selected cell, clustering revealed two main sequence clusters (figure 3A); with one enriched in alpha, the other enriched in beta, (figure 3B - top; with VDJ genes detailed at the bottom). IGV visualization confirmed consistent basecalls (figure 3C).

Conclusions We presented a workflow that refines scTCR analysis by improving cell barcode validity and enabling precise reconstruction of full-length TCR through long-read sequencing. Further refinement with barcode and UMI correction could enhance TCR repertoire analysis. This workflow is adaptable to other single-cell/spatial TCR technologies.

Acknowledgements This work was supported by the Bioinformatics Institute (BII), Singapore Immunology Network (SIgN), and Agency for Science, Technology and Research (A*STAR). This work was funded by H22J1a0043 and MOH-OFYIRG23jan-0021.

References

  1. Lin B, Hui J, Mao H. Nanopore technology and its applications in gene sequencing. Biosensors 2021;11(7):214. https://doi.org/10.3390/bios11070214

  2. Singh M, Al-Eryani G, Carswell S, Ferguson JM, Blackburn J, Barton K, Roden D, Luciani F, Giang Phan T, Junankar S, Jackson K, Goodnow CC, Smith MA, Swarbrick A. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nature Communications 2019;10(1):3120. https://doi.org/10.1038/s41467-019-11049-4

  3. Mika J, Candéias SM, Badie C, Polanska J. (2022). Can we detect T cell receptors from long-read RNA-Seq data?. In: Rojas I, Valenzuela O, Rojas F, Herrera LJ, Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_38

  4. Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Briefings in Bioinformatics 2019;20(4):1542–1559. https://doi.org/10.1093/bib/bby017

  5. Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Human Genomics, 2023;17(1):73. https://doi.org/10.1186/s40246-023-00522-3

  6. Gupta S, Witas R, Voigt A, Semenova T, Nguyen CQ. Single-cell sequencing of T cell receptors: a perspective on the technological development and translational application. Advances in Experimental medicine and Biology 2020;1255:29–50. https://doi.org/10.1007/978-981-15-4494-1_3

  7. 10x GENOMICS. What is a barcode whitelist?. available at: https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist (Oct. 2023)

  8. Ondov BD, Treangen TJ, Melsted P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016;17:132. https://doi.org/10.1186/s13059-016-0997-x

  9. Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 2015 May;12(5):380-1. doi: 10.1038/nmeth.3364. PMID: 25924071.

  10. Heng Li, Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, September 2018;34(18):3094–3100, https://doi.org/10.1093/bioinformatics/bty191

Abstract 159 Figure 1

Workflow of single-cell TCR analysis using short-read and long-read sequencing. (A) Peripheral blood mononuclear cells (PBMCs) were processed to generate full-length TCR cDNA libraries. Sequencing was performed using Illumina for short-read data and PromethION P2 for long-read data

Abstract 159 Figure 2

Cell barcode identification and validation. (A) The relative nucleotide positions of Read1 adaptors, cell barcodes, and Template Switch Oligos (TSO) in representative reads from the long-read scTCR data. (B) A Venn diagram illustrates the overlap and differences in cell barcodes derived from long-read scTCR and scTCR datasets when compared against the known barcode whitelist. 14.1% of long-read scTCR barcodes matched with scTCR datasets and barcode 10x whitelist, whereas 78.4% differed. Above 50% of the scTCR dataset’s barcodes were unmatched against known whitelist

Abstract 159 Figure 3

Segregation of sequences from a single cell into alpha and beta chains, VDJ geneiIdentification, and nucleotide consistency assessment. (A) Using reads from a randomly selected cell, Hierarchical clustering based on the pairwise Mash distance matrix clearly distinguishes two main groups. (B) MIXCR analysis: the top table indicates that the left cluster (orange dendrogram branch) predominantly consists of alpha chains (98.2%), while the right cluster (green branch) is primarily composed of beta chains (97.53%); the bottom table lists the identified VDJ gene combinations for each cluster. This indicates that our clustering approach can effectively segregate the alpha and beta chains.(C) IGV visualize representative sequences from the beta group aligned to the TRBV2 gene, showing consistent basecalls at individual locus

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.