Main

Non-Hodgkin lymphomas (NHLs) are cancers of B, T or natural killer lymphocytes. The two most common types of NHL, follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL), together comprise 60% of new B-cell NHL diagnoses each year in North America1. FL is an indolent and typically incurable disease characterized by clinical and genetic heterogeneity. DLBCL is aggressive and likewise heterogeneous, comprising at least two distinct subtypes that respond differently to standard treatments. Both FL and the germinal centre B-cell (GCB) cell of origin (COO) subtype of DLBCL derive from germinal centre B cells, whereas the activated B-cell (ABC) variety, which has a more aggressive clinical course, is thought to originate from B cells that have exited, or are poised to exit, the germinal centre2. Current knowledge of the specific genetic events leading to DLBCL and FL is limited to the presence of a few recurrent genetic abnormalities2. For example, 85–90% of FL and 30–40% of GCB DLBCL cases3,4 harbour t(14;18)(q32;q21), which results in deregulated expression of the BCL2 oncoprotein. Other genetic abnormalities unique to GCB DLBCL include amplification of the c-REL gene and of the miR-17-92 microRNA cluster5. In contrast to GCB cases, 24% of ABC DLBCLs harbour structural alterations or inactivating mutations affecting PRDM1, which is involved in differentiation of GCB cells into antibody-secreting plasma cells6. ABC-specific mutations also affect genes regulating NF-κB signalling7,8,9, with TNFAIP3 (also known as A20) and MYD88 (ref. 10) the most abundantly mutated in 24% and 39% of cases, respectively. To enhance our understanding of the genetic architecture of B-cell NHL, we undertook a study to (1) identify somatic mutations and (2) determine the prevalence, expression and focal recurrence of mutations in FL and DLBCL. Using strategies and techniques applied to cancer genome and transcriptome characterization by ourselves and others11,12,13, we sequenced tumour DNA and/or RNA from 117 tumour samples and 10 cell lines (Supplementary Tables 1 and 2) and identified 651 genes (Supplementary Figure 1) with evidence of somatic mutation in B-cell NHL. After validation, we showed that 109 genes were somatically mutated in two or more NHL cases. We further characterized the frequency and nature of mutations within MLL2 and MEF2B, which were among the most frequently mutated genes with no previously known role in lymphoma.

Identification of recurrently mutated genes

We sequenced the genomes or exomes of 14 NHL cases, all with matched constitutional DNA sequenced to comparable depths (Supplementary Tables 1 and 2). After screening for single nucleotide variants followed by subtraction of known polymorphisms and visual inspection of the sequence read alignments, we identified 717 non-synonymous variants (coding single nucleotide variants; cSNVs) affecting 651 genes (Supplementary Figure 1 and Supplementary Methods). We identified between 20 and 135 cSNVs in each of these genomes. Only 25 of the 651 genes with cSNVs were represented in the cancer gene census (December 2010 release)14.

We performed RNA sequencing (RNA-seq) on these 14 NHL cases and an expanded set of 113 samples comprising 83 DLBCL, 12 FL and 8 B-cell NHL cases with other histologies and 10 DLBCL-derived cell lines (Supplementary Table 2). We analysed these data to identify novel fusion transcripts (Supplementary Table 3) and cSNVs (Fig. 1). We identified 240 genes with at least one cSNV in a genome/exome or an RNA-seq ‘mutation hot spot’ (see later), and with cSNVs in at least three cases in total (Supplementary Table 4). We selected cSNVs from each of these 240 genes for re-sequencing to confirm their somatic status. We did not re-sequence genes with previously documented mutations in lymphoma (for example, CD79B, BCL2). We confirmed the somatic status of 543 cSNVs in 317 genes, with 109 genes having at least two confirmed somatic mutations (Supplementary Tables 4 and 5). Of the successfully re-sequenced cSNVs predicted from the genomes, 171 (94.5%) were confirmed somatic, 7 were false calls and 3 were present in the germ line. These 109 recurrently mutated genes were significantly enriched for genes implicated in lymphocyte activation (P = 8.3 × 10−4; for example, STAT6, BCL10), lymphocyte differentiation (P = 3.5 × 10−3; for example, CARD11), and regulation of apoptosis (P = 1.9 × 10−3; for example, BTG1, BTG2). Also significantly enriched were genes linked to transcriptional regulation (P = 5.4 × 10−4; for example, TP53) and genes involved in methylation (P = 2.2 × 10−4) and acetylation (P = 1.2 × 10−2), including histone methyltransferase (HMT) and acetyltransferase (HAT) enzymes known previously to be mutated in lymphoma (for example, EZH2 (ref. 13) and CREBBP (ref. 15); Supplementary Methods).

Figure 1: Genome-wide visualization of somatic mutation targets in NHL.
figure 1

Overview of structural rearrangements and copy number variations (CNVs) in the 11 DLBCL genomes and cSNVs in the 109 recurrently mutated genes identified in our analysis. Inner arcs represent somatic fusion transcripts identified in at least one of the 11 genomes. The CNVs and LOH detected in each of the 11 DLBCL tumour/normal pairs are displayed on the concentric sets of rings. The inner 11 rings show regions of enhanced homozygosity plotted with blue (interpreted as LOH). The outer 11 rings show somatic CNVs. Purple circles indicate the position of genes with at least two confirmed somatic mutations with circle diameter proportional to the number of cases with cSNVs detected in that gene. Circles representing the genes with significant evidence for positive selection are labelled. Coincidence between recurrently mutated genes and regions of gain/loss are colour-coded in the labels (green, loss; red, gain). For example B2M, which encodes beta-2-microglobulin, is recurrently mutated and is deleted in two cases.

PowerPoint slide

Mutation hot spots can result from mutations at sites under strong selective pressure and we have previously identified such sites using RNA-seq data13. We searched our RNA-seq data for genes with mutation hot spots, and identified 10 genes that were not mutated in the 14 genomes (PIM1, FOXO1, CCND3, TP53, IRF4, BTG2, CD79B, BCL7A, IKZF3 and B2M), of which five (FOXO1, CCND3, BTG2, IKZF3 and B2M) were not previously known targets of point mutation in NHL (Supplementary Table 6 and Supplementary Methods). FOXO1, BCL7A and B2M had hot spots affecting their start codons. The effect of a FOXO1 start codon mutation, which was observed in three cases, was further studied using a cell line in which the initiating ATG was mutated to TTG. Western blots probed with a FOXO1 antibody revealed a band with a reduced molecular weight, indicative of a FOXO1 amino-terminal truncation (Supplementary Figure 2), consistent with use of the next in-frame ATG for translation initiation. A second hot spot in FOXO1 at T24 was mutated in two cases. T24 is reportedly phosphorylated by AKT subsequent to B-cell receptor (BCR) stimulation16 inducing FOXO1 nuclear export.

We analysed the RNA-seq data to determine whether any of the somatic mutations in the 109 recurrently mutated genes showed evidence for allelic imbalance with expression favouring one allele. Out of 380 expressed heterozygous mutant alleles, we observed preferential expression of the mutation for 16.8% (64/380) and preferential expression of the wild type for 27.8% (106/380; Supplementary Table 7). Seven genes showed evidence for significant preferential expression of the mutant allele in at least two cases: BCL2, CARD11, CD79B, EZH2, IRF4, MEF2B and TP53; Supplementary Methods. In 27 out of 43 cases with BCL2 cSNVs, expression favoured the mutant allele, consistent with the previously-described hypothesis that the translocated (and hence, transcriptionally deregulated) allele of BCL2 is targeted by somatic hypermutation17. Examples of mutations at known oncogenic hot spot sites such as F123I in CARD11 (ref. 18) showed allelic imbalance favouring the mutant allele in some cases. Similarly, we noted expression favouring two novel hot spot mutations in MEF2B (Y69 and D83) and two sites in EZH2 not previously reported as mutated in lymphoma (A682G and A692V).

We sought to distinguish new cancer-related mutations from passenger mutations using the approach proposed previously19. We reasoned that this would reveal genes with strong selection signatures, and mutations in such genes would be good candidate cancer drivers. We identified 26 genes with significant evidence for positive selection (false discovery rate = 0.03, Supplementary Methods), with either selective pressure for acquiring non-synonymous point mutations or truncating/nonsense mutations (Supplementary Methods; Table 1 and Supplementary Table 8). Included were known lymphoma oncogenes (BCL2, CD79B (ref. 9), CARD11 (ref. 18), MYD88 (ref. 10) and EZH2 (ref. 13)), all of which showed signatures indicative of selection for non-synonymous variants.

Table 1 Overview of cSNVs and confirmed somatic mutations in most frequently mutated genes

Evidence for selection of inactivating changes

We expected tumour suppressor genes to show strong selection for the acquisition of nonsense mutations. In our analysis, the eight most significant genes included seven with strong selective pressure for nonsense mutations, including the known tumour suppressor genes TP53 and TNFRSF14 (ref. 20 ; Table 1). CREBBP, recently reported as commonly inactivated in DLBCL15, also showed some evidence for acquisition of nonsense mutations and cSNVs (Supplementary Figure 3 and Supplementary Table 9). We also observed enrichment for nonsense mutations in BCL10, a positive regulator of NF-κB, in which oncogenic truncated products have been described in lymphomas21. The remaining strongly significant genes (BTG1, GNA13, SGK1 and MLL2) had no reported role in lymphoma. GNA13 was affected by mutations in 22 cases including multiple nonsense mutations. GNA13 encodes the alpha subunit of a heterotrimeric G-protein coupled receptor responsible for modulating RhoA activity22. Some of the mutated residues negatively affect its function23,24, including a T203A mutation, which also showed allelic imbalance favouring the mutant allele (Supplementary Table 7). GNA13 protein was reduced or absent on western blots in cell lines harbouring either a nonsense mutation, a stop codon deletion, a frame shifting deletion, or changes affecting splice sites (Supplementary Methods and Supplementary Figure 4).

SGK1 encodes a phosphatidylinositol-3-OH kinase (PI(3)K)-regulated kinase with functions including regulation of FOXO transcription factors25, regulation of NF-κB by phosphorylating IκB kinase26, and negative regulation of NOTCH signalling27. SGK1 also resides within a region of chromosome 6 commonly deleted in DLBCL (Fig. 1)5. The mechanism by which SGK1 and GNA13 inactivation may contribute to lymphoma is unclear, but the strong degree of apparent selection towards their inactivation and their overall high mutation frequency (each mutated in 18 of 106 DLBCL cases) suggests that their loss contributes to B-cell NHL. Certain genes are known to be mutated more commonly in GCB DLBCLs (for example, TP53 (ref. 28) and EZH2 (ref. 13)). Here, both SGK1 and GNA13 mutations were found only in GCB cases (P = 1.93 × 10−3 and 2.28 × 10−4, Fisher’s exact test; n = 15 and 18, respectively) (Fig. 2). Two additional genes (MEF2B and TNFRSF14) with no previously described role in DLBCL showed a similar restriction to GCB cases (Fig. 2).

Figure 2: Overview of mutations and potential cooperative interactions in NHL.
figure 2

This heat map displays possible trends towards co-occurrence (red) and mutual exclusion (blue) of somatic mutations and structural rearrangements. Colours were assigned by taking the minimum value of a left- and right-tailed Fisher’s exact test. To capture trends a P-value threshold of 0.3 was used, with the darkest shade of the colour indicating those meeting statistical significance (P ≤ 0.05). The relative frequency of mutations in ABC (blue), GCB (red), unclassifiable (black) DLBCLs and FL (yellow) cases is shown on the left. Genes were arranged with those having significant (P < 0.05, Fisher’s exact test) enrichment for mutations in ABC cases (blue triangle) towards the top (and left) and those with significant enrichment for mutations in GCB cases (red triangle) towards the bottom (and right). The total number of cases in which each gene contained either cSNVs or confirmed somatic mutations is shown at the top. The cluster of blue squares (upper-right) results from the mutual exclusion of the ABC-enriched mutations (for example, MYD88, CD79B) from the GCB-enriched mutations (for example, EZH2, GNA13). Presence of structural rearrangements involving the two oncogenes BCL6 and BCL2 (indicated as BCL6s and BCL2s) was determined with FISH techniques using break-apart probes (Supplementary Methods).

PowerPoint slide

Inactivating MLL2 mutations

MLL2 showed the most significant evidence for selection and the largest number of nonsense SNVs. Our RNA-seq analysis indicated that 26.0% (33/127) of cases carried at least one MLL2 cSNV. To address the possibility that variable RNA-seq coverage of MLL2 failed to capture some mutations, we PCR-amplified the entire MLL2 locus (36 kilobases) in 89 cases (35 primary FLs, 17 DLBCL cell lines, and 37 DLBCLs). Of these cases 58 were among the RNA-seq cohort. Illumina amplicon re-sequencing (Supplementary Methods) revealed 78 mutations, confirming the RNA-seq mutations in the overlapping cases and identifying 33 additional mutations. We confirmed the somatic status of 46 variants using Sanger sequencing (Supplementary Table 10), and showed that 20 of the 33 additional mutations were insertions or deletions (indels). Three SNVs at splice sites were also detected, as were 10 new cSNVs that had not been detected by RNA-seq.

The somatic mutations were distributed across MLL2 (Fig. 3a). Of these, 37% (n = 29/78) were nonsense mutations, 46% (n = 36/78) were indels that altered the reading frame, 8% (n = 6/78) were point mutations at splice sites and 9% (n = 7/78) were non-synonymous amino acid substitutions (Table 2). Four of the somatic splice site mutations had effects on MLL2 transcript length and structure. For example, two heterozygous splice site mutations resulted in the use of a novel splice donor site and an intron retention event.

Figure 3: Summary and effect of somatic mutations affecting MLL2 and MEF2B.
figure 3

a, Re-sequencing the MLL2 locus in 89 samples revealed mainly nonsense (red circles) and frameshift-inducing indel mutations (orange triangles; inverted triangles for insertions and upright triangles for deletions). A smaller number of non-synonymous somatic mutations (green circles) and point mutations or deletions affecting splice sites (yellow stars) were also observed. All of the non-synonymous point mutations affected a residue within either the catalytic SET domain, the FYRC domain (FY-rich carboxy-terminal domain) or PHD zinc finger domains. The effect of these splice-site mutations on MLL2 splicing was also explored (Supplementary Figure 7). b, The cSNVs and somatic mutations found in MEF2B in all FL and DLBCL cases sequenced are shown with the same symbols. Only the amino acids with variants in at least two patients are labelled. cSNVs were most prevalent in the first two protein-coding exons of MEF2B (exons 2 and 3). The crystal structure of MEF2 bound to EP300 supports the idea that two of the mutated sites (L67 and Y69) are important in the interaction between these proteins (Supplementary Figure 8 and Supplementary Discussion)50.

PowerPoint slide

Table 2 Summary of types of MLL2 somatic mutations

Approximately half of the NHL cases we sequenced had two MLL2 mutations (Supplementary Table 10). We used bacterial artificial chromosome (BAC) clone sequencing in eight FL cases to show that in all eight cases the mutations were in trans, affecting both MLL2 alleles. This observation is consistent with the notion that there is a complete, or near-complete, loss of MLL2 in the tumour cells of such patients.

With the exception of two primary FL cases and two DLBCL cell lines (Pfeiffer and SU-DHL-9), the majority of MLL2 mutations seemed to be heterozygous. Analysis of Affymetrix 500k SNP array data from two FL cases with apparent homozygous mutations revealed that both tumours showed copy number neutral loss of heterozygosity (LOH) for the region of chromosome 12 containing MLL2 (Supplementary Methods). Thus, in addition to bi-allelic mutation, LOH is a second, albeit less common mechanism by which MLL2 function is lost.

MLL2 was the most frequently mutated gene in FL, and among the most frequently mutated genes in DLBCL (Fig. 2). We confirmed MLL2 mutations in 31 of 35 FL patients (89%), in 12 of 37 DLBCL patients (32%), in 10 of 17 DLBCL cell lines (59%) and in none of the eight normal centroblast samples we sequenced. Our analysis predicted that the majority of the somatic mutations observed in MLL2 were inactivating (91% disrupted the reading frame or were truncating point mutations), indicating to us that MLL2 is a tumour suppressor of significance in NHL.

Recurrent point mutations in MEF2B

Our selective pressure analysis also revealed genes with stronger pressure for acquisition of amino acid substitutions than for nonsense mutations. One such gene was MEF2B, which had not previously been linked to lymphoma. We found that 20 (15.7%) cases had MEF2B cSNVs and 4 (3.1%) cases had MEF2C cSNVs. All cSNVs detected by RNA-seq affected either the MADS box or MEF2 domains. To determine the frequency and scope of MEF2B mutations, we Sanger-sequenced exons 2 and 3 in 261 primary FL samples; 259 DLBCL primary tumours; 17 cell lines; 35 cases of assorted NHL (IBL, composite FL and PBMCL); and eight non-malignant centroblast samples. We also used a capture strategy (Supplementary Methods) to sequence the entire MEF2B coding region in the 261 FL samples, revealing six additional variants outside exons 2 and 3. We thus identified 69 cases (34 DLBCL, 12.67%; and 35 FL, 15.33%) with MEF2B cSNVs or indels, failing to observe novel variants in other NHL and non-malignant samples. Of the variants 55 (80%) affected residues within the MADS box and MEF2 domains encoded by exons 2 and 3 (Supplementary Table 11; Fig. 3b). Each patient generally had a single MEF2B variant and we observed relatively few (eight in total, 10.7%) truncation-inducing SNVs or indels. Non-synonymous SNVs were by far the most common type of change observed, with 59.4% of detected variants affecting K4, Y69, N81 or D83. In 12 cases MEF2B mutations were shown to be somatic, including representative mutations at each of K4, Y69, N81 and D83 (Supplementary Table 12). We did not detect mutations in ABC cases, indicating that somatic mutations in MEF2B have a role unique to the development of GCB DLBCL and FL (Fig. 2).

Discussion

In our study of genome, transcriptome and exome sequences from 127 B-cell NHL cases, we identified 109 genes with clear evidence of somatic mutation in multiple individuals. Significant selection seems to act on at least 26 of these for the acquisition of either nonsense or missense mutations. To the best of our knowledge, the majority of these genes had not previously been associated with any cancer type. We observed an enrichment of somatic mutations affecting genes involved in transcriptional regulation and, more specifically, chromatin modification.

MLL2 emerged from our analysis as a major tumour suppressor locus in NHL. It is one of six human H3K4-specific methyltransferases29, all of which share homology with the Drosophila trithorax gene. Trimethylated H3K4 (H3K4me3) is an epigenetic mark associated with the promoters of actively transcribed genes. By laying down this mark, MLLs are responsible for the transcriptional regulation of developmental genes including the homeobox (Hox) gene family30 which collectively control segment specificity and cell fate in the developing embryo31,32. Each MLL family member is thought to target different subsets of Hox genes33 and in addition, MLL2 is known to regulate the transcription of a diverse set of genes34. Recently, MLL2 mutations were reported in a small-cell lung cancer cell line35 and in renal carcinoma36, but the frequency of nonsense mutations affecting MLL2 in these cancers was not established in these reports. Inactivating mutations were reported recently in MLL2 or MLL3 in 16% of medulloblastoma patients37, further implicating MLL2 as a cancer gene.

Our data link MLL2 somatic mutations to B-cell NHL. The reported mutations are likely to be inactivating and in eight of the cases with multiple mutations, we confirmed that both alleles were affected, presumably resulting in essentially complete loss of MLL2 function. The high prevalence of MLL2 mutations in FL (89%) equals the frequency of the t(14;18)(q32;q21) translocation, which is considered the most prevalent genetic abnormality in FL3. In DLBCL tumour samples and cell lines, MLL2 mutation frequencies were 32% and 59%, respectively, also exceeding the prevalence of the most frequent cytogenetic abnormalities, such as the various translocations involving 3q27, which occur in 25–30% of DLBCLs and are enriched in ABC cases38. Importantly, we found MLL2 mutated in both DLBCL subtypes (Fig. 2). Our analyses thus indicate that MLL2 acts as a central tumour suppressor in FL and both DLBCL subtypes.

The MEF2 gene family encodes four related transcription factors that recruit histone-modifying enzymes including histone deacetylases (HDACs) and HATs in a calcium-regulated manner. Although truncating variants were detected in our analysis of MEF2 gene family members, our analysis suggests that, in contrast to MLL2, MEF2 family members tend to selectively acquire non-synonymous amino acid substitutions. In the case of MEF2B, 59.4% of all the cSNVs were found at four sites within the protein (K4, Y69, N81 and D83), and all four of these sites were confirmed to be targets of somatic mutation. D83 is affected in 39% of the MEF2B alterations, resulting in replacement of the charged aspartate with any of alanine, glycine or valine. Although we cannot yet predict the consequences of these substitutions on protein function, it seems likely that their effect would have an impact on the ability of MEF2B to facilitate gene expression and thus have a role in promoting the malignant transformation of germinal centre B cells to lymphoma (Supplementary Discussion).

MEF2B mutations can be linked to CREBBP and EP300 mutations, and to recurrent Y641 mutations in EZH2 (ref. 13). One target of CREBBP/EP300 HAT activity is H3K27, which is methylated by EZH2 to repress transcription. There is evidence that the action of EZH2 antagonizes that of CREBBP/EP300 (ref. 39). One function of MEF2 is to recruit either HDACs or CREBBP/EP300 to target genes40, and it has been suggested that HDACs compete with CREBBP/EP300 for the same binding site on MEF2 (ref. 41). Under normal Ca2+ levels, MEF2 is bound by type IIa HDACs, which maintain the tails of histone proteins in a deacetylated repressive chromatin state42. Increased cytoplasmic Ca2+ levels induce the nuclear export of HDACs, enabling the recruitment of HATs such as CREBBP/EP300, facilitating transcription at MEF2 target genes. Mutation of CREBBP, EP300 or MEF2B may have an impact on the expression of MEF2 target genes owing to reduced acetylation of nucleosomes near these genes (Supplementary Figure 5; Supplementary Discussion). In light of the recent finding that heterozygous EZH2 Y641 mutations enhance overall H3K27 trimethylation activity of PRC2 (refs 43, 44), it is possible that mutation of both MLL2 and EZH2 could cooperate in reducing the expression of some of the same target genes. Our data indicate that (1) post-transcriptional modification of histones is of key importance in germinal centre B cells and (2) deregulated histone modification due to these mutations is likely to result in reduced acetylation and enhanced methylation, and acts as a core driver event in the development of NHL (Supplementary Figure 5).

Methods Summary

All samples analysed contained at least 50% tumour cells. Genomes, exomes and transcriptomes were sequenced using a combination of Illumina GAIIx and HiSeq 2000 instruments to read lengths of between 36 and 100 nucleotides. Exome capture was performed using the Agilent SureSelect Target Enrichment System Protocol (Version 1.0, September 2009). Alignment was accomplished using BWA45 and variants were identified using SNVmix46. Variants were manually reviewed in IGV and were confirmed (where applicable) by PCR followed by either Sanger sequencing or Illumina re-sequencing. Structural rearrangements in genomes and transcriptomes were identified using ABySS47. Gene expression values used for subtype assignment were calculated as reads per kilobase gene model per million mapped reads (RPKM) values48 and subtypes were assigned using an adaptation of the method developed for data from Affymetrix expression arrays49 trained with samples previously classified by this standard approach.