Background Microsatellite instability in colon cancer implies favorable therapeutic outcomes after checkpoint blockade immunotherapy. However, the molecular nature of microsatellite instability is not well elucidated.
Methods We examined the immune microenvironment of colon cancer using assessments of the bulk transcriptome and the single-cell transcriptome focusing on molecular nature of microsatellite stability (MSS) and microsatellite instability (MSI) in colorectal cancer from a public database. The association of the mutation pattern and microsatellite status was analyzed by a random forest algorithm in The Cancer Genome Atlas (TCGA) and validated by our in-house dataset (39 tumor mutational burden (TMB)-low MSS colon cancer, 10 TMB-high MSS colon cancer, 15 MSI colon cancer). A prognostic model was constructed to predict the survival potential and stratify microsatellite status by a neural network.
Results Despite the hostile CD8+ cytotoxic T lymphocyte (CTL)/Th1 microenvironment in MSI colon cancer, a high percentage of exhausted CD8+ T cells and upregulated expression of immune checkpoints were identified in MSI colon cancer at the single-cell level, indicating the potential neutralizing effect of cytotoxic T-cell activity by exhausted T-cell status. A more homogeneous highly expressed pattern of PD1 was observed in CD8+ T cells from MSI colon cancer; however, a small subgroup of CD8+ T cells with high expression of checkpoint molecules was identified in MSS patients. A random forest algorithm predicted important mutations that were associated with MSI status in the TCGA colon cancer cohort, and our in-house cohort validated higher frequencies of BRAF, ARID1A, RNF43, and KM2B mutations in MSI colon cancer. A robust microsatellite status–related gene signature was built to predict the prognosis and differentiate between MSI and MSS tumors. A neural network using the expression profile of the microsatellite status–related gene signature was constructed. A receiver operating characteristic curve was used to evaluate the accuracy rate of neural network, reaching 100%.
Conclusion Our analysis unraveled the difference in the molecular nature and genomic variance in MSI and MSS colon cancer. The microsatellite status–related gene signature is better at predicting the prognosis of patients with colon cancer and response to the combination of immune checkpoint inhibitor–based immunotherapy and anti-VEGF therapy.
- tumor microenvironment
- tumor biomarkers
- computational biology
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Colon cancer is a major cause of death worldwide. Approximately 25% of patients present with stage 4 tumors, and 25%–50% present with early-stage disease but develop metastatic disease.1 2 The prognosis has improved by early detection technologies, but the survival of patients with metastatic colon cancer remains poor.1 Thus, identifying an efficient therapy is urgently necessary for colon cancer. Colon cancer can be classified into mismatch-repair-deficient (dMMR) microsatellite instability (MSI-H) subtypes and mismatch-repair-proficient (pMMR) microsatellite stability (MSS) subtypes.3 dMMR-MSI colon tumors are characterized by a high tumor mutational burden (TMB) and high infiltration of activated CD8+ cytotoxic T lymphocytes (CTL) as well as activated Th1 cells with IFN-γ production.3 These features of MSI colon cancer make it a promising target for immunotherapy. A recent phase III study revealed pembrolizumab provided a clinically significant improvement in progression-free survival versus chemotherapy as first-line therapy for patients with MSI-H/dMMR metastatic colorectal cancer.4 Moreover, Keytruda (pembrolizumab) has been approved for the treatment of adult and pediatric patients with solid tumors with tissue TMB-high (≥10 mutations/megabase). Significant clinical benefit was obtained by anti-PD-1 therapy with pembrolizumab among patients with previously treated unresectable or metastatic MSI-H/dMMR non-colorectal cancer.5 These studies highlight the importance of microsatellite status and TMB value in immunotherapy. In contrast, MSS colon cancer has been long considered to have an inactive response to immune checkpoint inhibitors. The lack of immune infiltration and low TMB decrease the potential of obtaining benefits from immunotherapy and define MSI colon cancer as an “immune resistant” phenotype.6
The biological features and prognostic value of immune cell infiltration in tumors have been extensively reported in various tumor types.7–10 However, to date, most studies of colon cancer have been limited to bulk RNA-seq analysis.11 12 Considering the heterogeneous cell types in colon cancer, a single-cell transcriptome is urgently needed to analyze the difference in the molecular nature of immune cells in MSS and MSI colon cancer.
In the present study, we analyzed single-cell RNA-seq data, bulk RNA-seq data, and gene mutation data from public and in-house datasets. Through several computational biology methods, we depicted the molecular nature of immune cell populations from the bulk and single-cell levels in MSS and MSI colon cancer. The most important gene mutations associated with MSI tumors were identified. Then, we generated a differentially expressed gene profile from MSS and MSI colon cancer in order to identify a robust microsatellite status–related gene signature and thus predict the prognosis and differentiate between MSI and MSS tumors. A microsatellite status–related gene signature–based risk score (MSRS) was calculated for each patient. The association of the MSRS with immune cell populations, immune checkpoint inhibitors, and VEGF activities were further analyzed. A neural network using the expression profile of the microsatellite status–related gene signature was built and evaluated. We demonstrated that the microsatellite status–related gene signature is better at predicting the prognosis of patients with colon cancer and stratifying patients with a high risk of cancer-specific death, regardless of microsatellite instability.
Single cell-transcriptome files of GSE146771 and GSE98638 were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). GSE14333 is a colon cancer microarray dataset and was downloaded from GEO. The Cancer Genome Atlas (TCGA) RNA-seq transcriptome data and clinical information regarding colon cancer were downloaded via the UCSC Xena browser (https://xenabrowser.net/). The patient information is provided in online supplemental table 1 (online supplemental file 1). R software (V.3.5.3) and Python (V.3.6) were used for all the analyses in the manuscript. All bioinformatic methods are listed in online supplemental table 2 (Online supplemental file 1).
Immune cell abundance estimation
CIBERSORT R script was used to estimate the abundance of immune cell populations in the TCGA colon cancer cohorts by the bulk RNA-seq dataset.13 The cytotoxic T cell (CYT) score was obtained from cytolytic genes (calculated as the geometrical mean of PRF1 and GZMA).14 The observation of the T-cell infiltration score (TIS) was defined as the average of the standardized values for the subsets of T cells.15
ssGSEA analysis and GO analysis
Single-sample GSEA (ssGSEA) was applied to evaluate the enrichment scores for each sample. The “GSVA” package was used for ssGSEA analysis.16 The hallmark gene sets were downloaded from The Broad Institute. Tcm cell and Tem cell gene sets were from Bindea et al.17 The exhausted T score gene set was defined by Zheng et al.18 The correlation between the risk score and the enrichment scores was performed with Spearman’s coefficient. GO analysis was performed with the “clusterProfiler” package.19
Single-cell RNA-seq analysis
The single-cell transcriptome analysis for GSE146771 and GSE98638 was performed with “scanpy” in Python.20 The batch effect was removed by the BBKNN method. Nonlinear dimensional reduction was performed with the UMAP method. The gene features in each cluster were found by “scanpy”. The cell–cell interactions (CCIs) were calculated with the “SingleCellSignalR” package in R.21
Random forest algorithm for feature importance ranking
A random forest algorithm was applied on TCGA colon cancer NGS data to find the most important mutations associated with the microsatellite status in colon cancer. Briefly, the gene mutation dataset and microsatellite status were applied to find the most important gene mutations associated with microsatellite status in colon cancer. The “ranger” package was used to find the best hyperparameter in the regression process and build the model.22
DNA extraction, library preparation and targeted enrichment
NGS analyses were performed in a centralized clinical testing center (Nanjing Geneseeq Technology) according to protocols reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang University. DNA extraction, library preparation, and targeted capture enrichment were performed with previously described methods with minor modifications.23 Briefly, genomic DNA from the white blood cells was extracted using the DNeasy Blood & Tissue Kit (Qiagen) and was used as the normal control to remove germline variations. After deparaffinizing of Formalin-fixed paraffin-embedded (FFPE) samples with xylene, genomic DNA was extracted using the QIAamp DNA FFPE Tissue Kit (Qiagen). After quantification of DNA with Qubit 3.0 using the dsDNA HS Assay Kit (Life Technologies), a Nanodrop 2000 (Thermo Fisher) was used to evaluate DNA quality.
The KAPA Hyper Prep kit (KAPA Biosystems) was used to prepare libraries according to previously described methods.24 Briefly, a Covaris M220 instrument was used to shear 1–2 µg of genomic DNA into ~350 bp fragments. The KAPA Hyper DNA Library Prep kit (Roche Diagnostics) was used for end repair, A-tailing, and adaptor ligation of fragmented DNA, and Agencourt AMPure XP beads (Beckman Coulter) were used to perform size selection. PCR was performed to amplify DNA Libraries followed by purification using Agencourt AMPure XP beads.
Selective enrichment of 425 predefined cancer-related genes (Geneseeq Prime panel) was performed with a customized xGen lockdown probes panel (Integrated DNA Technologies). Blocking reagents were defined by Human cot-1 DNA (Life Technologies) and xGen Universal Blocking Oligos (Integrated DNA Technologies). Dynabeads M-270 (Life Technologies) and the xGen Lockdown Hybridization and Wash kit (Integrated DNA Technologies) were used to perform the capture reaction. PCR amplification was performed on captured libraries with KAPA HiFi HotStart ReadyMix (KAPA Biosystems). The KAPA Library Quantification kit (KAPA Biosystems) was used to quantify the purified library. Bioanalyzer 2100 was used to calculate the fragment size distribution.
Sequencing and bioinformatics analysis
Sequencing of the target enriched libraries was performed on the HiSeq4000 platform (Illumina) with 2×150 bp pair-end reads. bcl2fastq (v2.19) was used to demultiplex the sequencing data followed by analysis with Trimmomatic25 to remove low-quality (quality <15) or N bases. Then, alignment of the data to the hg19 reference human genome was performed with the Burrows-Wheeler Aligner (bwa-mem)26 followed by processing using the Picard suite (available at: https://broadinstitute.github.io/picard/) and the Genome Analysis Toolkit (GATK).27 VarScan228 and HaplotypeCaller/UnifiedGenotyper were used to call SNPs and indels in GATK, with the mutant allele frequency cut-off set as 0.5%. dbSNP and the 1000 Genome project were used to remove common variants. Germline mutations were filtered out by comparing with patient’s whole blood controls.
FACTERA29 and ADTEx30 were used to identify gene fusions and copy number variations, respectively. The log2 ratio cut-off for copy number gain and copy number loss was defined as 2.0 and 0.6, respectively, for tissue samples. TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined and was calculated as previously described.31 Briefly, all base substitutions, including nonsynonymous and synonymous alterations and indels in the coding region of targeted genes, were considered with the exception of known hotspot mutations in oncogenic driver genes and truncations in tumor suppressors.
A customized analysis algorithm was used to perform MSI testing. Briefly, the MSI sites within the panel (425-gene panel) coverage were defined with a sum of 52 mononucleotide repeats with a minimum of 15 bp repeats. The number of sequencing reads supporting each mononucleotide repeat length was tailed for each MSI site. The distribution of the length species within the tumor sample at the MSI site was compared with that summarized from a pool of normal samples. A significantly altered distribution of the length species in a site was considered unstable. A sample was defined as MSI if more than 40% of the evaluated site display instability.
Differentially expressed gene (DEG) analysis
The DEG analysis was applied with the “Limma” package on TCGA colon cancer transcriptome file.32 An empirical Bayesian method was applied to estimate the fold change between MSI-L and MSI-H or MSS and MSI-H using moderated t-tests. The adjusted p value for multiple testing was calculated using the Benjamini-Hochberg correction. The genes with an absolute log2 fold change greater than 1 and adjusted p value less than 0.05 were identified as DEGs.
Gene signature construction
The least absolute shrinkage and selection operator (LASSO) was applied to construct the microsatellite-related gene signature by TCGA colon cancer transcriptome data. L1-norm was used to penalize the weight of the features.33 A microsatellite-related gene signature–based risk score (MSRS) formula was established by including individual normalized gene expression values weighted by their LASSO Cox coefficients: .
Weighted correlation network analysis (WGCNA)
WGCNA was performed with the “WGCNA” package.34 A power of β=3 and a scale-free R2=0.95 were selected as soft-threshold parameters to ensure a signed scale-free co-expression gene network. A total of 12 non-gray modules were generated.
Tumor immune microenvironment in colon cancer with MSS or MSI status
The tumor immune microenvironment of colon cancer is depicted with the 22 immune cell populations by the TCGA cohort (see online supplemental table S1 for detailed information). A higher abundance of CD8 T cells, activated NK cells, and M1 macrophages was identified in MSI colon cancer than in MSS colon cancer. Then, we estimated the CYT activity, CD8:Treg ratio, IFN-γ expression signature, and T-cell infiltration score (TIS) in MSI and MSS colon cancer. As expected, a higher CYT activity (figure 1B), IFN-γ expression signature (figure 1C), TIS (figure 1D), and CD8:Treg ratio (figure 1E) were found in MSI colon cancer. CTLA4, PD-L1 (CD274), and PD1 (PDCD1) are three important biomarkers for immune checkpoint inhibitor (ICI)–based immunotherapy. The expression levels of these three checkpoint molecules were greater in MSI-H colon than in MSS and MSI-L colon cancer (figure 1F–1H). Hence, we also evaluated the exhausted T-cell score with the ssGSEA algorithm. Despite greater CYT activity in MSI-H colon cancer, a higher exhausted T-cell score was observed in MSI-H colon cancer (figure 1I).
Single-cell RNA-seq analysis unravels the heterogeneous cell populations with differentially expressed genes in MSI and MSS colon cancer
The tumor tissues of colon cancer contain several cell populations such as malignant cells, immune cells (eg, CD8+ T cells, CD4+ T cells, B cells, and myeloid cells), and fibroblasts. To identify the molecular features in MSI and MSS colon cancer at the single-cell level, we analyzed the single-cell RNA-seq transcriptome data from GSE146771. The UMAP method was used to perform nonlinear dimensional reduction. The cells were clustered by their cell types rather than by the patients (figure 1J and online supplemental figure S1A). The microsatellite status was also shown by a UMAP plot (figure 1K). The Leiden method was applied to identify the clusters. The results revealed that the malignant cells, fibroblasts, CD8+ T cells, and CD4+ T cells from MSI and MSS colon cancer tissues were identified into different clusters (online supplemental figure S1B), indicating the molecular difference between colon cancer with MSS and MSI. Several important tumor microenvironment (TME) genes were selected to analyze the heterogeneity in colon cancer tissues (online supplemental figure S1C). MS4A1 was enriched in a subpopulation of B cells whereas FOXP3 was expressed in a subgroup of CD4+ T cells. MS4A6A, PLAUR, and PECAM1 are known to be featured genes expressed in macrophages. High expression of these genes was found in myeloid cells. EPCAM is overexpressed in epithelial cancers associated with enhanced malignant potential. EPCAM was nearly expressed in all epithelial cells and malignant cells in colon cancer.
To further analyze the difference between MSS and MSI colon cancer at the single-cell level, CD8+ T cells, CD4+ T cells, B cells and malignant cells were extracted from the whole dataset. The UMAP method revealed the distribution of these cells and the microsatellite status in a two-dimensional plot (figure 2A and B). Several stem cell markers were selected to explore the heterogeneity in MSI and MSS tumors. ALDH1A1 was highly expressed in MSS tumors and lowly expressed in MSI tumors, and ICAM1 was highly expressed in MSI tumors and lowly expressed in MSS tumors (figure 2C). The featured gene expression in CD8+ T cells, CD4+ T cells, B cells and malignant cells from MSI and MSS colon cancer is shown in figure 2D. The CD8+ T cells and CD4+ T cells from MSI and MSS colon cancer had a significant difference in the most featured three genes. The CD8+ T cells from MSI colon cancer showed high KLRK1, PTPRCAP, and HLA-DRB5 expression (figure 2E); however, their counterparts from MSS colon cancer exhibited high expression of CCL5, NKG7, and GZMA (figure 2F). CD4, PTPRCAP, and PIGN were highly expressed in CD4+ T cells from MSI colon cancer (figure 2G), and LTB, CD2, and CD3D were highly expressed in CD4+ T cells from MSS colon cancer (figure 2H). The expression of three immune checkpoints (PD-L1, CTLA4, and PD1) are shown in online supplemental figure S2A–C . PD1 expression of single cells in MSI colon cancer was more homogeneous than its expression in MSS colon cancer. Most CD8+ T cells in MSI colon cancer had a high level of PD1 expression (indicated by the shape of the violin plot) whereas only a subgroup of CD8+ T cells in MSS colon cancer showed a high expression level of PD1. Using a reference with annotated T-cell status (exhausted and non-exhausted), we further analyzed the CD8+ T-cell status in MSS and MSS colon cancer. The results revealed a significantly higher percentage of exhausted CD8+ T cells in MSI colon cancer (69%) than in MSS colon cancer (36%) (p<0.001) (figure 2I).
The CCIs in MSI and MSS tumors were explored with the single-cell transcriptome data. The directions and the number of ligand–receptor interactions are shown in figure 3A,B. The intracellular interactions of PD1 on CD8+ T cells from MSI and MSS tumors were therefore identified by the CCIs (figure 3C,D). A more complicated network with involvement of HLAs corresponding to MHC class II (DP, DQ, and DR) was found in MSI tumors. Greater expression of MHC class II molecules was found in MSI tumors than in MSS tumors, and ubiquitous expression of MHC class I molecules was observed between MSI and MSS tumors (figure 3E and online supplemental figure S3). ssGSEA suggested that MSI tumors harbored expression signatures of “PD-1 signaling”, “T-cell receptor signaling”, “graft-versus-host disease”, and “allograft rejection” (figure 3F). These findings indicated that there was high MHC-II expression consistent with a pro-immune/antitumor response in MSI tumors.
Association between microsatellite status and gene mutations in colon cancer
The random forest algorithm was applied to determine the importance of gene mutations associated with microsatellite status in colon cancer. Thirty of the most important gene mutations were identified in the TCGA cohort (figure 4). The results indicated that BMPR2, RNF43, and ANK3 were the most important gene mutations associated with the microsatellite status. MSI-H colon cancer tissues had a higher percentage of BMPR2 RNF43 and ANK3 mutations than MSI-L and MSS colon cancer tissues in the TCGA colon cancer cohort. To further validate the model mentioned previously, the mutation information of 39 TMB-low MSS colon cancer, 10 TMB-high MSS colon cancer, and 15 MSI colon cancer tissues were collected with the 425-panel as our in-house cohort (figure 5A). BRAF, ARID1A, RNF43, and KM2B were involved in the 425-panel. Our data revealed a significantly high frequency of BRAF, ARID1A, RNF43, and KM2B mutations in MSI colon cancer, which confirmed the random forest model. Moreover, the detailed mutation types of BRAF, ARID1A, RNF43, and KM2B are shown in figure 5B.
Microsatellite status–related gene signature predicts the prognosis and clinical outcome of patients with colon cancer
DEG analysis was performed for MSS and MSI colon cancer in the TCGA cohort. The volcano plot in figure 6A,B shows the DEGs among MSS and MSI-H, MSI-L, and MSI-H colon cancer. The Venn diagram in figure 6C shows that a total of 1202 genes were involved in the microsatellite status–related DEGs (online supplemental file 1). GO analysis revealed that the response to INF-γ, leukocyte chemotaxis, pattern specification process, organic anion transport, and other biological processes were enriched with the microsatellite status–related DEGs (figure 6D). The 1202 microsatellite status–related DEGs were used to build a prognostic model to predict the cancer-specific survival (CSS) of patients with colon cancer. LASSO Cox regression penalized the unimportant features in the regularization process, and 17 features (LOC100272228, HOXC8, TMEM195, TNFAIP2, TRIM58, PLAG1, MTERF, APOD, SYT12, HOXC11, ENO2, HBA1, HOXC4, TNFRSF19, SULT1B1, B3GNT8, and ATOH1) were finally selected as the microsatellite status–related gene signature (figure 6E, online supplemental table S3). The expression levels of the 17 genes in MSS and MSI colon cancer of different TNM stages are shown in (online supplemental figure 4-5). The MSRS for each patient was calculated based on the expression levels of the 17 genes with the coefficient in the model. The risk score formula was established as follows: . The patients with a higher MSRS had a worse CSS than patients with a lower MSRS (HR 16.5, p<0.001, best cut-off method with cut-off value 2.93) (figure 6F). Time-dependent ROC analysis further confirmed the higher accuracy of the MSRS (mean AUC values higher than 0.8 during the follow-up period) than other clinicopathological traits.
The relationship of treatment outcome and the MSRS indicated that patients with a lower MSRS had a higher fraction of complete remission (CR) (93% vs 70%) and a lower percentage of progressive disease (PD) (3% vs 22%) status (figure 7A). The association of the microsatellite status–related signature-based RS with the immune populations were calculated with Pearson’s coefficient (figure 7B), indicating a significant correlation with CD4+ T cells (negative) and memory B cells (positive). Next, we extended the analysis to include the immune checkpoint molecules, including the B7-CD28 family (CD274, CD276, CTLA4, HHLA2, ICOS, ICOSLG, PDCD1, PDCD1LG2, TMIGD2, VTCN1), TNF superfamily (BTLA, CD27, CD40, CD40LG, TNFRSF18, TNFRSF4, TNFRSF9, TNFSF4, TNFSF9) and others (ENTPD1, FGL1 HAVCR2, IDO1, LAG3, NCR3, NT5E, SIGLEC15) (figure 7C). In total, seven immune checkpoint molecules, including ENTPD1, SIGLEC15, TNFRSF4, TNFSF4, TNFSF14, PDCD1LG2, and HAVCR2, were upregulated in the high MSRS group (Cor >0.25). An external cohort (GSE14333) was used to confirm the results. The patients with a low MSRS had a more favorable survival outcome than patients with a high MSRS (online supplemental figure S6A). The association between the MSRS and expression pattern of immune checkpoint molecules in the GSE14333 showed the same tendency as it did in the TCGA cohort (online supplemental figure 6SB). For instance, the association of TNFSF4, HAVCR2, and ENTPD1 with the MSRS are shown in (online supplemental figure 6C–E). The counterparts in the TCGA cohort are shown in (online supplemental figure 6SF–H). As recently the combination of regorafenib plus nivolumab has shown a manageable safety profile and encouraging antitumor activity in colon cancer,37 we further investigated the association of the MSRS and VEGF activities in MSS, MSI-L, and MSI-H tumors. The results indicated a strikingly higher association of the MSRS and VEGF in MSI-H tumors (0.53) than in MSI-L tumors (0.34) and MSS tumors (0.14) (figure 7D). Several studies have discovered that PD-1 blockade therapy reactivates effector T cells and also promotes the proliferation of Tcm cells, improving antitumor immunity.38 39 Thus, the association between Tcm activity/Tem activity/Tscm activity and exhausted T-cell score/MSRS was analyzed. The results indicated more significant association between Tcm activity/Tem activity/Tscm activity and MSRS/exhausted score in MSI tumor compared with the association in MSS tumor (online supplemental figure S7A–F).
ssGSEA analysis was performed to evaluate the biological meaning underlying the MSRS (online supplemental figure S8A). Myogenesis, apical junctions, hedgehog signaling, TGF-β signaling, and epithelial mesenchymal transition were the hallmarks with most significantly correlated with the MSRS. To further confirm the results from the ssGSEA, WGCNA was performed. With a power of β=3 set as the optimal soft threshold to construct a scale-free signed network (online supplemental figure S8B,C), a total of 12 nongray modules were identified (online supplemental figure S8D,E). Among these nongray modules, two modules (turquoise and magenta) with the highest absolute correlation values with the MSRS were selected for further analysis. The genes from the two modules were applied to GO analysis. The results indicated that extracellular matrix reorganization–related enrichments (online supplemental figure S8E) and cell division–related enrichments (online supplemental figure S8F) were related to the genes involved in the turquoise and magenta modules, which confirmed the results from the ssGSEA analysis.
Subgroup survival analysis
The MSRS serves as a promising marker to predict DSS in different subgroups of patients with colon cancer in the TCGA cohort, including TNM stage I–II (HR 6.37, p<0.001) (online supplemental figure S9A), TNM stage III–IV (HR 9.36, p<0.001) (online supplemental figure S9B), age <66 years (HR 10.19, p<0.001) (online supplemental figure 9C), age >66 years (HR 9.76, p<0.001) (online supplemental figure 9D), MSI (HR 18.39, p<0.001) (online supplemental figure S9E), MSS (HR 6.24, p<0.001) (online supplemental figure S9F), female gender (HR 11.15, p<0.001) (online supplemental figure S9G), and male gender (HR 11.37, p<0.001) (online supplemental figure S9H).
Neural network construction to identify MSS and MSI colon cancer
To further use the microsatellite status–related gene signature, we built a neural network with the gene expression pattern of the microsatellite status–related gene signature to stratify MSS and MSI colon cancer. Figure 7E shows a schematic diagram of the neural network. The expression profile of the microsatellite status–related gene signature from the TCGA colon cancer cohort was split into the training dataset (2/3) and testing dataset (1/3). With increasing epochs in the training set, the loss value decreased in the testing set (figure 7F). The confusion matrix shows that all tumors were classified accurately in the testing set (figure 7G).
Immunotherapy has achieved long-term durable responses for multiple types of previously difficult-to-treat solid cancers, such as lung cancer and melanoma.40 MSI-H tumors cancer have been approved for the application of ICI such as pembrolizumab.3 Nivolumab (anti-PD1) plus ipilimumab (anti-CTLA-4) has demonstrated high response rates, improving progression-free survival and overall survival at 12 months, and is a promising treatment option for patients with dMMR MSI-H metastatic colorectal cancer.41 In contrast, MSS colon cancer seems to obtain limited benefits from immunotherapy. Chalabi et al 42 tested ipilimumab plus nivolumab in early stage dMMR and pMMR colon cancers. Major pathological responses were observed in 7/7 (100%) dMMR tumors, with 4/7 (57%) complete responses, and no major pathological responses were found in pMMR tumors. Interestingly, significant increases in T-cell infiltration, particularly in CD8+ T cells, were seen post-treatment in both pMMR and dMMR tumors, indicating the partial activation of CD8+ T cell–mediated immunity in pMMR tumors after immunotherapy without successful pathological responses.42 In our analysis on public single-cell transcriptome data from GEO database, we found a small subset of CD8+ T cells with high expression of ICI (eg, CTLA4 and PD1) in MSS tumors and generously high PD1-expressing CD8+ T cells in MSI tumors, which may explain the increased T-cell infiltration after ipilimumab plus nivolumab treatment in both MSI and MSS tumors. dMMR-MSI colon cancer is characterized by the abundant infiltration of CD8+ tumor-infiltrating lymphocytes, CD4+ T helper cells and activation of type I interferon signaling.43 In our analysis, we confirmed the enrichment of CD8+ T cells, CD4+ T cells, and M1-macrophages in MSI-H colon cancer. A high CYT activity, ratio of CD8+:Treg, IFNG expression, and TIS were observed in MSI-H colon cancer. CD8+ T-cell infiltration into the tumor microenvironment has been long considered as a favorable factor for clinical outcomes due to its role in tumor-killing ability. Thus, we analyzed the T-cell status in MSI colon cancer. T-cell receptors (TCRs) on the surface of T cells can bind with complexes of peptides with major histocompatibility complex (MHC) class I molecules presented on the surface of all cells.44 Nevertheless, the activation of T cells does not rely on recognition of peptide–MHC class I complexes by the TCR alone. A range of coinhibitory or costimulatory signals tune the response of T cell, which tumor cells exploit to escape destruction.45 Thus, coinhibitory receptors such as CTLA4 and PD1 on T cells and the ligand of PD1 (PD-L1) on tumor cells are the normal targets of ICI.46 We found that in MSI colon cancer, CTL4, CD274, and PD-L1 were all highly expressed, which neutralized the high CYT activity in MSI colon cancer. ssGSEA analysis with exhausted T-cell hallmarks further confirmed the conclusion.
Then, we analyzed the molecular signature of MSI and MSS colon cancer at the single-cell level. The most significant featured genes were distinct in different immune cells and malignant cells from MSS and MSI colon cancer, which indicated the influence of microsatellite status on the tumor microenvironment. With single-cell transcriptome data with CD8+ T-cell annotation information (exhausted vs non-exhausted) as a reference, we further annotated the CD8+ T cells in MSI and MSS colon cancer. As a significantly higher percentage of exhausted CD8+ T cells were identified in MSI colon cancer than in MSS colon cancer, MSI tumors obtain a promising effect from ICI targeting PD1 (pembrolizumab and nivolumab), PDL1 (atezolizumab and durvalumab), and CTLA4 (ipilimumab). Characterizing cell–cell communication by analysis of ligand–receptor interactions in the tumor microenvironment can be performed by single-cell RNA-seq technology, which provides single-cell transcriptomes to define cell heterogeneity and also identifies profiles of ligand–receptor interactions and cell–cell signaling networks. In our single-cell transcriptome analysis on public single-cell transcriptome data, a complicated intracellular regulatory network that involved MHC-II molecules was identified in MSI tumors. As MHC-II expression is a functional antigen-presenting molecule that can promote CD4+ T-helper cell aid to the antitumor milieu,47 the MHC-II involvement in the PD1 network implies the importance of CD4+ T cells in antitumor immunity and ICI-based immunotherapy for MSI tumors.
Impaired DNA MMR facilitates the occurrence of frameshift mutations by insertions or deletions as well as single-base mismatches that can be point mutations in coding regions.10 Thus, dMMR MSI-H colon cancer is characterized by a high tumor mutation burden. The high mutational burden in MSI colon tumor creates many tumor-specific neoantigens, typically 10 to 50 times those of MSS colon cancer.43 The neoantigens may be processed and presented on MHC and recognized by T cells. This may be one potential reason for the high TIL infiltration in MSI colon cancer. Thus, it would be important to identify the important gene mutations that are associated with MSI tumors. BMPR2, RNF43, ANK3, and several other gene mutations were identified by a random forest algorithm in our analysis from the TCGA colon cancer dataset. One study revealed that a frameshift mutation of BMPR2 gene occurs in colon cancer with MSI-H.48 BMP signaling acts as a tumor suppressor in gastrointestinal tumorigenesis. This mutation of BMPR2 leads to the loss of BMPR2 expression and downregulation of BMP signaling in colon cancer.49 Another study demonstrated that truncating mutations of RNF43 were linked to the MSI-H phenotype in colon cancer.50 RNF43 acts as an E3 ubiquitin-protein ligase and can negatively regulate the Wnt signaling pathway.51 RNF43 mutation has been related to the sensitivity to small-molecule Wnt-specific acyltransferase porcupine inhibitors.52 Both our in-house dataset and the results from Giannakis et al 50 confirmed the high frequency of RNF43 in MSI colon cancer. A BRAF mutation is involved in impaired mismatch repair, and the mutation frequency, as expected, is higher in TMB-high MSS colon cancer and MSI-H colon cancer than in TMB-low MSS colon cancer. ARID1A is a tumor suppressor that is involved in transcriptional regulation, proliferation, and chromatin remodeling.53 The high frequency of ARID1A in MSI colon cancer from our in-house cohort and the study by Cajuso et al 53 further confirmed the accuracy of our random forest model.
In the next step, we analyzed the gene expression patterns in MSI tumors and MSS tumors from TCGA colon cancer dataset. A total of 1202 DEGs were found. The GO analysis by DEGs indicated enrichments in the IFN-γ response, which confirmed the previous results. As microsatellite status is not a good indicator of prognosis, we developed a gene signature–based risk score, the MSRS, to predict the cancer-specific survival of patients with colon cancer. The MSRS serves as an indicator of tumor-specific survival beyond MSI staging and TNM staging in total cohorts and subgroup analyses (eg, microsatellite status, TNM stage, age, and gender). Thus, we demonstrated that the MSRS is better at defining the prognosis of patients with colon cancer and identifying patients at a high risk of tumor recurrence, regardless of microsatellite instability. Furthermore, the MSRS also gives an indicator of tumor outcome and immunotherapy response. Patients with a lower MSRS have high probability of complete remission and active immunotherapy response. Pearson’s coefficients indicated a greater correlation between the MSRS and CD4+ T cells than between the MSRS and CD8+ T cells. One potential explanation is the “exhausted” status of CD8+ cytotoxic T cells in colon cancer. As we discussed previously, MSI colon cancer cells and a small subgroup of MSS colon cancer cells have a high expression level of coinhibitory factors such as CTLA4, PD1, and PD-L1, which impair the immune-killing activity of CD8+ T cells. The molecular feature underlying the MSRS implies the difference in the molecular nature in MSS and MSI colon cancer.
The benefits of CD8+ T cell–mediated antitumor responses might be blocked by high activity of the VEGF pathway, which mediates immune tolerance and also restricts T-cell infiltration into the tumor.54 In addition, one study revealed that VEGF-A induces the expression of the transcription factor TOX in T cells to drive an exhaustion-specific transcription program in T cells.55 Hence, it is logical to consider that combined blockade of PD-1 and VEGF-A may restore the antitumor functions of T cells and aid in better control of tumors. In our analysis, the MSRS showed a significantly higher correlation with VEGF pathway activity in MSI-H patients than in MSI-L patients and MSS patients. Thus, we conferred that the combination of a VEGF inhibitor (bevacizumab) and atezolizumab may be promising for patients with MSI-H tumors rather than for patients with MSS tumors, especially for patients with a high MSRS. Moreover, more significant association was found between Tcm activity/Tem activity/Tscm activity and MSRS/exhausted score in MSI tumor compared with the association in MSS tumor. Tcm/Tem presence in the tissue suggests that there is tumor antigen recognition. High tumor antigen recognition can lead to T-cell exhaustion. Therefore, a positive association between Tcm/Tem and exhaustion is plausible. Currently, Tscm is discussed as being those T cells that are precursor to exhausted T cells.56 57 They are thought to be the T cells that respond to checkpoint blockade. In our analysis, Tscm activity also significantly correlated with exhausted score as expected, especially in MSI-H tumors. The difference of association of the exhausted score and Tcm activity/Tem activity/Tscm activity in MSI and MSS tumor indicates the discrepancy of molecular pathways involved in the response to ICI-based immunotherapy. Furthermore, Tcm activity/Tem activity/Tscm activity showed a significant association with MSRS in MSI patients, implying the potential benefits of ICI-based immunotherapy on MSI tumor patients with high MSRS.
In summary, the featured gene expression in CD8+ T cells, CD4+ T cells, B cells, and malignant cells from MSI and MSS colon cancer showed a clear difference of the molecular nature and immune cell activities. A more homogeneous highly expressed pattern of PD1 was observed in CD8+ T cells from MSI colon cancer; however, a small subgroup of CD8+ T cells with high expression of checkpoint molecules was identified in MSS patients. BMPR2, RNF43, and ANK3 were the most important gene mutations associated with the microsatellite status in colon cancer. We also built a microsatellite status–related gene signature with which we can predict the microsatellite status with our neural network framework. The MSRS was calculated, which serves as an indicator of prognosis beyond MSI staging and the response to the combination of immunotherapy and anti-VEGF therapy.
We would like to thank Dr. Haowen Deng for the discussions and suggestions in building neural network.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
XB and HZ contributed equally.
Contributors WF, PZ, and FL supervised the project. XB and HZ designed the workflow and performed the bioinformatic analysis. WF, PZ, and FL led the study. WW, SC, XD, XZ, QF, ZT, LL, and YZ performed material preparation and data collection. XB and HZ wrote the first draft. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding This research was supported by Major Scientific Project of Zhejiang Province (2017C03028) and National Natural Science Foundation of China Program (Grant No. 81602128).
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval The studies involving human participants were reviewed and approved by the ethics committee of the First Affiliated Hospital, Zhejiang University School of Medicine.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement NGS data are available on reasonable request.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.