Article Text
Abstract
Background Beyond conventional short-read Next-Generation Sequencing (NGS), long-read sequencing technology, like Oxford Nanopore Technologies (ONT),1 enhance coverage and resolution of genomics fragments, crucial for detecting T Cell Receptors (TCRs) known for their variability from VDJ recombination.2 Short-read NGS sequences from 3’ or 5’ end of the cDNA template, for 3’ short-read sequencing, adapters/barcodes near constant regions hinder capturing variable regions. Long-read sequencing captures entire TCR regions, providing better resolution on the complete CDR3.3 However, long-reads have higher error rates,4 requiring a robust bioinformatic pipeline.5 6 We aim to improve the accuracy and reliability of TCR repertoire reconstruction, enhancing clonotype identification for T-cell vaccine development.
Methods Human PBMCs were processed for full-length TCR libraries using Chromium Next GEM Single Cell 5’ v2 (10xGenomics) and sequenced using illumina HiSeq X, producing the single-cell TCR (scTCR) dataset (figure 1A). A separate aliquot of full-length TCR cDNA was sequenced using Oxford Nanopore Technologies (ONT) Ligation Sequencing Kit V14 and PromethION Flow Cell (figure 1A), with dorado Duplex base-calling, producing the long-read scTCR dataset.
Long-read scTCR cell barcodes were identified following the 5’ adaptor (figure 2A). In the scTCR dataset, with adaptors removed, barcodes were located using TSO sequence. Extracted barcodes were validated against known whitelist,7 visualized using VennDiagram R package. Reads segregated by unique barcodes for identifying cell-specific TCR alpha/beta chains. Hierarchical clustering was performed on Mash distances8 using Ward’s Linkage. TCR alpha/beta clusters were verified using MiXCR-align.9 Sequencing errors were assessed against MiXCR reference using minimap2,10 and visualized on IGV.
Results We introduce a position-based cell barcode identification approach (figure 2A), to aid cell-specific TCR clonotype determination. Analysis shows that few cell barcodes from the long-read scTCR dataset matched with 10x whitelist, while majority did not (figure 2B). Over half of the scTCR dataset’s barcodes also failed to match, suggesting that barcode correction is needed. Using reads from a selected cell, clustering revealed two main sequence clusters (figure 3A); with one enriched in alpha, the other enriched in beta, (figure 3B - top; with VDJ genes detailed at the bottom). IGV visualization confirmed consistent basecalls (figure 3C).
Conclusions We presented a workflow that refines scTCR analysis by improving cell barcode validity and enabling precise reconstruction of full-length TCR through long-read sequencing. Further refinement with barcode and UMI correction could enhance TCR repertoire analysis. This workflow is adaptable to other single-cell/spatial TCR technologies.
Acknowledgements This work was supported by the Bioinformatics Institute (BII), Singapore Immunology Network (SIgN), and Agency for Science, Technology and Research (A*STAR). This work was funded by H22J1a0043 and MOH-OFYIRG23jan-0021.
References
Lin B, Hui J, Mao H. Nanopore technology and its applications in gene sequencing. Biosensors 2021;11(7):214. https://doi.org/10.3390/bios11070214
Singh M, Al-Eryani G, Carswell S, Ferguson JM, Blackburn J, Barton K, Roden D, Luciani F, Giang Phan T, Junankar S, Jackson K, Goodnow CC, Smith MA, Swarbrick A. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nature Communications 2019;10(1):3120. https://doi.org/10.1038/s41467-019-11049-4
Mika J, Candéias SM, Badie C, Polanska J. (2022). Can we detect T cell receptors from long-read RNA-Seq data?. In: Rojas I, Valenzuela O, Rojas F, Herrera LJ, Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_38
Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Briefings in Bioinformatics 2019;20(4):1542–1559. https://doi.org/10.1093/bib/bby017
Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Human Genomics, 2023;17(1):73. https://doi.org/10.1186/s40246-023-00522-3
Gupta S, Witas R, Voigt A, Semenova T, Nguyen CQ. Single-cell sequencing of T cell receptors: a perspective on the technological development and translational application. Advances in Experimental medicine and Biology 2020;1255:29–50. https://doi.org/10.1007/978-981-15-4494-1_3
10x GENOMICS. What is a barcode whitelist?. available at: https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist (Oct. 2023)
Ondov BD, Treangen TJ, Melsted P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016;17:132. https://doi.org/10.1186/s13059-016-0997-x
Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 2015 May;12(5):380-1. doi: 10.1038/nmeth.3364. PMID: 25924071.
Heng Li, Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, September 2018;34(18):3094–3100, https://doi.org/10.1093/bioinformatics/bty191
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.