Immunosequencing: applications of immune repertoire deep sequencing
Introduction
The human adaptive immune system provides protection against an enormous variety of pathogens. This protection is mediated by receptors on the surface of B and T cells that bind to pathogenic or pathogen derived antigens. The human germline genome is limited in size, so it cannot code for a sufficient number of receptor genes to protect against the diversity of potential pathogens. The vast receptor diversity needed for protection is created dynamically by somatic rearrangement of the germline DNA at specific loci in B and T cells. Both the B cell receptor (BCR) and T cell receptor (TCR) are formed from pairing of a larger chain and smaller chain. The BCR immunoglobulin heavy chain (IGH) and the TCR beta chain (TCRB) rearrange noncontiguous variable (V), diversity (D), and joining (J) gene segments to create combinatorial diversity. At the junctions between the V–D and D–J, nucleotides are deleted and pseudo-random non-templated nucleotides are added to create massive junctional diversity. The respective small chain, immunoglobulin lambda or kappa (IGL/K) for B cells and T cell receptor alpha (TCRA) for T cells rearranges similarly, but with V and J only. The set of B and T cells constituting the adaptive immune system are comprised of millions of different clones defined by their specific BCR or TCR sequence rearrangement. The nucleotide sequence of the BCRs and TCRs provide a nearly unique molecular tag for each clone in the adaptive immune system. Moreover, these sequences provide a primary piece of functional information for each clone, as the receptor structures determine their antigenic binding.
The highly variable CDR3 regions in both BCRs and TCRs are short, between 15 and 60 nucleotides, making them amenable to rapid interrogation by high-throughput sequencing (HTS). However, methods designed to sequence large genomic regions do not efficiently apply to these short, highly diverse regions. A new field of immunosequencing has emerged with technologies specifically tailored to sequence BCRs and TCRs, along with a set of promising applications for these technologies [1, 2, 3, 4]. As the adaptive immune system is believed to play a role in most, if not all, human disease states, the possible applications are expansive.
In this review, we present the technical challenges to high throughput sequencing of adaptive immune receptors and the present set of solutions, including both experimental and computational issues. Additionally, we will classify immune sequencing applications into categories and describe progress to date in each area and speculate on future developments.
Section snippets
Challenges
There are two primary challenges in HTS of adaptive immune receptors. The first is the somatic rearrangement of the loci. The rearranged BCR and TCR loci are structurally different than in the germline genome. Although there are some known rules that govern the rearrangements, the resultant genes are not minor changes from a known template. Second, the rearranged sequences are highly diverse. The number of clones with different TCRB rearrangements in the blood of a healthy human is estimated to
Genomic DNA
As discussed above in challenges, enriching for rearranged adaptive immune receptor gDNA is complicated due to the J segment specific intron between J and the constant region. Although some homology exists within the 13 J's and within the 48 V's, there is no shared sequence of length sufficient for binding of a universal degenerate primer. In order to specifically enrich for the full set of potential TCRB rearrangements, a multiplex PCR strategy has been employed, utilizing a mixture of V and J
Sequencing
At present, there are a few different options for HTS technologies, each of which offer advantages and drawbacks. A common feature is that each technology requires specific DNA sequences on both ends of the target molecules to be sequenced. These sequences are added either by synthesis using PCR or by ligation. The sample preparation steps differ significantly, but an evaluation is beyond the scope of this review. We give a short list of pros and cons for each of the three most common
Data processing
Both PCR amplification and HTS generate errors in the resultant sequences. The PCR errors can propagate, with the potential for errors to compound at each PCR cycle. Fortunately, these compound errors effect an exponentially smaller fraction of the total reads at each step. Effectively, the PCR errors can be modeled as a phylogenetic tree. Since these trees do not undergo selection, the number of elements strictly decreases along each branch. And, the PCR error rate is sufficiently small that
Applications
The set of immunosequencing applications [19] to date can be divided into three categories: clone tracking, repertoire properties, and identification of public clones. Clone tracking is based on the observation that both T and B cell repertoires are diverse, with small probability of different clones sharing nucleotide identical TCRβ or IGH sequences. Therefore, the immune receptor nucleotide sequence is a nearly unique molecule tag for a clone, allowing clone tracking over time and between
Future developments
The full T or B cell receptor is made of a large and small chain that are coded by genes on different chromosomes. In order to do functional studies and identify specific antigenic targets of TCRs or BCRs, sequencing the pair of large and small chains is important. However, there is a significant challenge to matching the large and small chain pairs from the same cell at a scale that would be high throughput (i.e. screening of thousands of cells per assay). To date, the primary strategies have
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
References (47)
- et al.
Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells
Blood
(2009) - et al.
Ultra-sensitive detection of rare T cell clones
J Immunol Method
(2012) - et al.
Individual variation in the germline Ig gene repertoire inferred from variable region gene rearrangements
J Immunol
(2010) - et al.
Immune surveillance by CD8alphaalpha+ skin-resident T cells in human herpes virus infection
Nature
(2013) - et al.
High frequency of herpesvirus-specific clonotypes in the human T cell repertoire can remain stable over decades with minimal turnover
J Virol
(2013) - et al.
Allelic exclusion and peripheral reconstitution by TCR transgenic T cells arising from transduced human hematopoietic stem/progenitor cells
Mol Ther
(2013) - et al.
Applications of next-generation sequencing to blood and marrow transplantation
Biol Blood Marrow Transplant
(2012) - et al.
Genetically retargeting CD8+ lymphocyte subsets for cancer immunotherapy
Curr Opin Immunol
(2011) - et al.
New tools for classification and monitoring of autoimmune diseases
Nat Rev Rheumatol
(2012) - et al.
High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire
Nat Biotechnol
(2013)