Background In cancer therapy, higher-resolution tumor-agnostic biomarkers that predict response to immune checkpoint inhibitor (ICI) therapy are needed. Mutation signatures reflect underlying oncogenic processes that can affect tumor immunogenicity, and thus potentially delineate ICI treatment response among tumor types.
Methods Based on mutational signature analysis, we developed a stratification for all solid tumors in The Cancer Genome Atlas (TCGA). Subsequently, we developed a new software (Genomic Subtyping and Predictive Response Analysis for Cancer Tumor ICi Efficacy, GS-PRACTICE) to classify new tumors submitted to whole-exome sequencing. Using existing data from 973 pan-cancer ICI-treated cases with outcomes, we evaluated the subtype-response predictive performance.
Results Systematic analysis on TCGA samples identified eight tumor genomic subtypes, which were characterized by features represented by smoking exposure, ultraviolet light exposure, APOBEC enzyme activity, POLE mutation, mismatch repair deficiency, homologous recombination deficiency, genomic stability, and aging. The former five subtypes were presumed to form an immune-responsive group acting as candidates for ICI therapy because of their high expression of immune-related genes and enrichment in cancer types with FDA approval for ICI monotherapy. In the validation cohort, the samples assigned by GS-PRACTICE to the immune-reactive subtypes were significantly associated with ICI response independent of cancer type and TMB high or low status.
Conclusions The new tumor subtyping method can serve as a tumor-agnostic biomarker for ICI response prediction and will improve decision making in cancer treatment.
- tumor biomarkers
- genetic markers
Data availability statement
Data may be obtained from a third party and are not publicly available.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
WHAT IS ALREADY KNOWN ON THIS TOPIC
Mismatch repair deficiency (MMRd) and high tumor mutational burden (TMB-high) are proposed as tumor-agnostic predictive biomarkers for immune checkpoint inhibitors (ICIs), but their frequencies vary among tumor types.
In a limited number of cancer types, including non-small-cell lung cancer and melanoma, mutagenic processes other than MMRd and the mutation signatures reflecting such processes have been reported to be associated with ICI sensitivity.
WHAT THIS STUDY ADDS
From the systematic analysis of mutational signatures in all solid tumors of The Cancer Genome Atlas, we developed a new method to classify whole-exome sequenced tumors into eight genomic subtypes with different immunogenicity.
In validation data including multiple cancer types, the classified tumor subtypes significantly correlated with ICI efficacy, independent of cancer type and TMB status.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
Our method provides a new pan-cancer biomarker for predicting ICI efficacy orthogonal to TMB status.
The results suggest that the mutational processes underlying carcinogenesis strongly affect tumor immunogenicity, leading to differences in ICI treatment response among tumor types.
The advent of immune checkpoint inhibitors (ICIs) has provided substantial opportunities in cancer treatment. However, the proportion of patients who benefit from ICIs varies widely by cancer type,1 and tumor-agnostic biomarkers to identify (un)responsive subsets are strongly desired. A recently established predictive biomarker is the loss of mismatch repair protein in immunohistochemistry or microsatellite instability (MSI-high), which indicates mismatch repair deficiency (MMRd) status.2 MMRd tumors are considered to be highly sensitive to ICI because they carry a large number of tumor-specific neoantigens.3 Another tumor agnostic biomarker recently approved by the Food and Drug Administration (FDA) is tumor mutational burden (TMB)-high status, where tumors have 10 or more mutations per megabase calculated from the FoundationOne CDx assay.4 Despite the approval of TMB as a biomarker, there exist a sufficient number of cases that have modest TMB but respond to ICI,5 6 and more sophisticated methods for identifying such tumors need to be developed.
Comprehensive gene mutation analysis in cancer enabled by high-throughput next-generation sequencing has revealed that even neutral somatic mutations, previously thought to be ‘passenger’ mutations, exhibit reproducible patterns of change, or mutational signatures, depending on the underlying endogenous and exogenous mutagenic processes.7 8 Certain mutational signatures are known to be associated with tumor immunogenicity,9–11 suggesting that differences in the background mutational processes may play an important role in antitumor immunity.
To advance oncology patient care by leveraging the signature-immunogenicity relationship, we report the development of a computational framework to classify tumors beyond their tissue origin. The tool is subsequently challenged to predict response to ICI independent of cancer type and TMB status using a large external patient dataset, demonstrating its feasibility and position to complement FDA-approved TMB analyses.
Materials and methods
The Cancer Genome Atlas data
Clinical information of all tumors except diffuse large B-cell lymphoma, acute myeloid leukemia, and thymoma in The Cancer Genome Atlas (TCGA) studies was obtained from the cBioPortal (https://www.cbioportal.org/) and the broad GDAC websites (https://gdac.broadinstitute.org/). Among these, 9794 cases, whose somatic mutation profiles analyzed by Mutect212 were available on the GDC portal (https://portal.gdc.cancer.gov/), were included in this study. We also obtained the other somatic mutation profiles calculated by the three different variant callers (see the Methods section) and gene expression profiles from a previous report.13 The annotations of germline mutations and gene promoter methylations were obtained from previous reports.14 15 The contribution values to COSMIC (V.2) 30 mutational signatures (https://cancer.sanger.ac.uk/signatures/signatures_v2) of each sample were calculated using MutationalPatterns.16 The annotation of cancer types with FDA approval for ICI monotherapy was based on a previous report.17 The response rates for ICI monotherapy for each tumor type were obtained from previous reports,1 18 19
Pan-Cancer Analysis of Whole Genomes consortium (PCAWG), Clinical Proteomic Tumor Analysis Consortium (CPTAC), National Bioscience Database Center (NBDC), and cBioPortal datasets were obtained from their databases (online supplemental table S2). For the ICI-treated cohorts, samples collected from metastatic tumors and those with a history of ICI treatment at sample collection were excluded. A total of 973 patients from 13 datasets were included in the analysis (online supplemental table S3 and figure S15).
Statistical analyses were mainly performed in Python (V.3.7.4); the Mann-Whitney U test, χ2 test, and Spearman’s rank correlation coefficient test were performed using SciPy (V.1.6.1), survival analyses including Kaplan-Meier curve, log-rank test, and Cox proportional hazard regression using Lifelines (V.0.25.10) and StatsModels (V.0.12.2), machine learning analyses using Scikit-learn (V.0.24.1). The Venn diagram, the Jonckheere-Terpstra test, and the Passing-Bablok regression analysis were performed using ‘VennDiagram’ (V.1.6.20), ‘clinfun’ (V.1.0.15), and ‘mcr’ (V.1.2.2) packages in R. We considered a p<0.05 as being statistically significant.
Details are provided in online supplemental data.
Identification of eight genomic subtypes based on mutational signature analysis
Based on Mutect2-derived12 mutation annotations from whole-exome sequencing (WES) data, score profiles of COSMIC (V.2) mutational signatures were derived for each solid tumor in TCGA (n=9794). Eight tumor groups were obtained after clustering logarithm-transformed profiles (figure 1A). Based on the enrichment of signatures with proposed etiologies (online supplemental table S1), seven of these subtypes were labeled as groups associated with smoking (SMK), ultraviolet light (UVL), APOBEC (APB), DNA polymerase epsilon deficiency (POL), mismatch repair deficiency (MRD), homologous recombination deficiency (HRD), and aging (AGE). The remaining group that showed no specific accumulation of mutation signatures and the lowest number of mutations was assigned the genomic stability (GNS) subtype.
In terms of clinical information, age, gender, stage, and mortality differed considerably among the subtypes (figure 1A, online supplemental figure S1A). The proportion of patients with SMK history was highest in the SMK group. Molecularly characterized groups also contained enriched annotations, including high POLE mutations in the POL group, as well as MMR mutations, MLH1 methylation, and MSI high status in the MRD group. The HRD group contained characteristic BRCA alterations.20 The distribution of genomic subtypes differed among tumor types. (figure 1B, online supplemental figure S1B,C). Extensive analytics of each subtype are provided in online supplemental figures S1D and S2–S8.
Transcriptomes of genes associated with tumor immune response were assessed. Genes representing the infiltration of cytotoxic CD8 +T cells (CD8A, GZMB, and IFNG) and genes related to ICI response (CXCL9 and CXCL13)21 were upregulated in the five subtypes (SMK, UVL, APB, POL, MRD) relative to the others (HRD, GNS, AGE). The CYT score22 and GEP score23 related to ICI response were also higher in the same five subtypes. Postsubtyping also demonstrates that the five subtypes were more frequently of tumor origin with FDA approval for ICI monotherapy (figure 1A). Further, when the proportion of samples assigned to the five subtypes was scored per tumor type, the score was strongly correlated with the previously reported objective response rate to ICI monotherapy for that tumor type (figure 1C).1 18 19 The SMK/UVL/APB/POL/MRD subtypes thus serve to prognosticate positive response to ICI administration, and are hereafter termed immuno-responsive genomic subtypes (irGS).
Development of Genomic Subtyping and Predictive Response Analysis for Cancer Tumor ICi Efficacy
A software tool embedding machine learning was developed to stratify newly sequenced tumors into the eight genomic subtypes derived above (figure 2A). First, hierarchical clusters were again derived using each of three alternative variant calling schemes (online supplemental figure S9A, see the Methods section). High concurrence with analyses based on Mutect2 was observed (online supplemental figure S9A,B). To extract samples typical for each subtype as a training dataset, samples with matching classification results in at least three of the four methods, including concomitant classification with Mutect2, were selected and used for subsequent analysis (online supplemental figure S9B,C). The resulting 7181 samples and their 30 COSMIC signature scores were used as features to construct k-nearest neighbor, support vector machine, random forest, and logistic regression classifiers with optimized hyperparameters (see the Methods and online supplemental figure S10A). All classifiers showed more than 95% subset accuracy (exact match ratio) in multilabel classification (online supplemental figure S10B), yielding a robust eight-class ensemble-based stratification tool.
For new query inputs of somatic mutation profile scores derived from tumor sequencing, each of the four classifiers is executed, and predictions are deemed consistent when the three or four resultant classifications concur; otherwise a classification of undeterminable (UND) is assigned to the sample. Further, when a majority prediction is one of the SMK, UVL, APB, POL, or MRD subtypes, the subtype is additionally classified as irGS; otherwise it is labeled as non-irGS. The prediction system, GS-PRACTICE (acronym for ‘Genomic Subtyping and Predictive Response Analysis for Cancer Tumor ICi Efficacy’), is publicly available in the GitHub page (https://github.com/shirotak/GS-PRACTICE).
GS-PRACTICE was tested for its ability to stratify a diverse collection of samples from various sources into genomic subtypes (online supplemental table S2). Collectively, 96%–98% of samples were successfully assigned a subtype, indicating consensus in the ensemble’s individual classifiers. The classifier concordance rate was also consistent across different data sources, and was consistent irrespective of whether samples were of formalin-fixed paraffin-embedded (FFPE) or frozen tissue origin (online supplemental figure S10C). Reanalyses restricted to individual cancer types yielded identical conclusions with respect to tissue origin and source (figure 2B,C).
We applied GS-PRACTICE to 1916 samples of the PCAWG datasets24 using somatic mutation profiles in coding regions obtained from the UCSC Xena25 as input (online supplemental figure S11A,B). Results paralleled that of TCGA data (figure 2D). The differences in age, sex, and mortality among the subtypes were similar to those of TCGA. Somatic POLE mutations were common in the POL group, somatic MMR mutations in the MRD group, and somatic BRCA alterations (BRCA1/2 mutations with LOH) in the HRD group. APOBEC3 family gene expression was elevated in the APB group. The five irGS subgroups demonstrated increased gene expression and biomarker scores associated with infiltration of cytotoxic CD8 +T cells and ICI response (figure 2D, online supplemental figures S1A and S4A).
GS-PRACTICE as a tumor agnostic predictive biomarker for ICI response
973 cases, most of whom have metastatic lesions (online supplemental table S3), with information on objective response to ICI treatment were used to challenge and assess the subtyping and (non-)irGS assignment from GS-PRACTICE (online supplemental table S3). Taken in total, ICI response rate was significantly higher in irGS than non-irGS (34.6% vs 12.0%, p=5.1×10−14, figure 3A). When analyzed by the eight subtypes, the five subtypes belonging to irGS tended to have a higher response rate than the three non-irGS subtypes (online supplemental figure S12).
Next, to determine a cut-off for assignment of TMB-high, we compared the number of mutations detected in our WES pipeline with those in FoundationOne CDx using a bladder cancer dataset.26 Based on Passing-Bablok regression analysis, the cut-off of 10 mutations per megabase in the CDx panel corresponds to 173 missense mutations in a WES sample (95% CI 138 to 225) (figure 3B). Using this value as the cut-off for TMB-high, tumors categorized as TMB-high showed higher ICI response rate than those as TMB-low (43.5% vs 16.6%, p=3.7×10−20, figure 3C).
When we divided the tumors into four groups according to the pairwise stratifications of (non-)irGS and TMB-low/high, 97.2% of TMB-high tumors belonged to irGS and 96.9% of non-irGS tumors belonged to TMB-low (figure 3D). Response rate to ICI was highest in the TMB-high irGS group (43.6%). Critically, within TMB-low tumors, irGS tumors had a significantly higher response rate than non-irGS (22.9% vs 11.2%, p=1.1×10−4, figure 3E). Additionally, in a multivariate logistic regression analysis, irGS status was significantly associated with the objective response to ICI after adjustment for TMB-high status and cancer type (adjusted OR, 2.18; 95% CI, 1.40 to 3.40; p=5.6×10−4, figure 4). The trends were similar when examined separately by anti-PD-1 antibody or anti-PD-L1 antibody therapy, as well as by anti-CTLA-4 monotherapy and anti-CTLA-4/anti-PD-1 combination therapy (online supplemental figure S13). These results were also significant when limited to data from the KEYNOTE clinical trials (n=311), a prospective cohort of patients treated solely with anti-PD-1 antibody, pembrolizumab (online supplemental figure S14). Although the KEYNOTE trials excluded patients with clinically diagnosed MMRd tumors at enrolment, two tumors from the cohort (one each with gastric cancer and biliary tract cancer) were classified into the MRD subtype, and both of them responded to ICI. Furthermore, the results were similar when using the cohort’s optimal TMB cut-off determined by the ROC curve and the Youden index or using log(10)-transformed TMB as a continuous variable (online supplemental figure S15). Even when the recently reported score for estimating T-cell infiltration in tumors from WES data27 was added as a covariate, there remained a significant correlation between irGS and ICI response (online supplemental figure S16). The definition of objective response was different in the data of Anagnostou et al compared with other data (see online supplemental methods), but results were similar even after excluding such data (online supplemental figure S17). Genome subtyping and ICI response analysis by GS-PRACTICE for each of the 13 individual ICI studies comprizing the combined 973 patients are described in online supplemental figure S18 and table S4.
Finally, survival analysis was performed using data from the above ICI-treated cohorts (n=606, see the Methods section) to investigate whether irGS assignment by GS-PRACTICE was associated with overall survival. In univariate analysis, both irGS and TMB-high status were associated with favorable outcomes (log-rank test p=5.8×10−9, 1.5×10–9, figure 5A). Stratification analysis by the two statuses showed that the TMB-low non-irGS group had the worst overall survival (log-rank test p=9.0×10−11, figure 5B). This trend was similarly observed when analyzed per cancer type (figure 5C). Furthermore, Cox proportional hazard model analysis adjusted for irGS, TMB status (binary or continuous), and cancer type showed that both irGS and TMB status were independent favorable prognostic factors (figure 5D).
The relationship between mutational signatures and ICI response has been previously reported for several specific types of cancer. For example, mutational signatures in melanoma10 and non-small-cell lung cancer11 correlate with response to ICI, and these data are explained by the idea that the process of carcinogenesis by exogenous mutagens (UV, tobacco) results in highly immunogenic tumor antigens.28 In addition, APOBEC-related mutational signatures are associated with viral infections and a specific mutational pattern called kataegis, which also produces highly immunogenic antigens29 30 and is associated with ICI response in non-small-cell lung cancer.31 32 On the other hand, it has been reported that high copy number, aneuploidy, and HRD-associated scores inversely correlate with tumor immune response,33–35 and negative results in a recent clinical trial in ovarian cancer where half of the tumors showed HRD36 37 suggest that HRD-related signatures are unlikely to be associated with high sensitivity to ICIs. Aging-related (clock-like) mutational signatures are reported to be associated with lower immune activity in melanoma and non-small-cell lung cancer treated with ICI.9 38 Since many age-related gene mutations also occur in non-tumor cells,39 they may be related to immune tolerance. Our categorization of irGS and non-irGS in this study is supported by previous reports on the relationship between specific mutation signatures and tumor immunogenicity, and provides a cross-organ assessment of this relationship.
In June 2020, the FDA approved pembrolizumab for the treatment of tumors diagnosed 10 mutations per megabase or greater by FoundationOne CDx.4 This cut-off corresponded to 173 missense mutations in our WES analysis (figure 3B) and was close to the optimal cut-off value of 165 calculated by the ROC curve based on the collated cohort we assembled (online supplemental figure S14). However, TMB quantification based on a panel assay is still subject to fluctuation (figure 3B).40 In particular, tumor-only gene panel testing, including FoundationOne CDx, may overestimate TMB in non-Caucasians due to the paucity of public databases for germline variant filtering.41 42 As sequencing now practically impacts clinical decisions, comprehensive sequencing methods including WES are optimal for reproducible and reliable measurements.43 Furthermore, in currently available gene panel test data,44 less than 0.5% of tumor samples have more than 100 gene mutations, even including synonymous ones, making it difficult to apply GS-PRACTICE to data from such panel assays. As the cost of WES decreases and efforts toward the implementation of WES as a routine cancer treatment continue to advance,45 46 the combination of precision-improved TMB calculation and the orthogonal GS-PRACTICE method will usher in precise patient selection for ICI treatment.
There have been some criticisms that there is no logical basis for setting a universal TMB threshold for all solid tumors, since such an index is a continuous value that varies considerably among cancer types.7 47 48 Our analysis showed that almost all non-irGS tumors belonged to TMB-low (figure 3D, online supplemental figures S13–S15), indicating that the current TMB cut-off has the consequence to exclude non-irGS tumors, which have little or no immunogenic background mutational processes. In other words, our method may add biological rationales to the empirically determined TMB cut-off. Additionally, the previous report that the optimal cutoffs for TMB-high differed among cancer types5 may be explained by the different distribution of genomic subtypes per tumor origin (figure 1B).
While GS-PRACTICE represents an advance in cancer diagnostics and clinical decision making, some limitations of this work must be made transparent. First, due to lack of data from randomized controlled trials, it cannot be concluded that the differences in response rate and prolonged survival observed in this study are fully attributable to the ICI efficacy. To elucidate this, design and logistics of appropriate randomized control trials using ICI are needed. Second, the clinical cohorts validated by GS-PRACTICE were mostly Caucasian patients, so future validation is needed to determine whether the program is applicable to non-Caucasian patients. Third, accurate subtyping may not be possible for tumors with a small number of mutations due to computational reasons. The clustering results using the four variant callers showed relatively low concordance rates for the HRD, AGE, and GNS subtypes (online supplemental figure S9B). Renal cancers had a moderately low number of mutations and were mostly classified as HRD, but their HRD scores and indel signature six ratios were low (online supplemental figure S7B), indicating that they are unlikely to have HRD properties. It is known that the response to ICI in renal cancer is not associated with TMB,49 and the present analysis also did not identify any characteristic mutation patterns associated with ICI response. One method to improve on the state of the art would be to apply GS-PRACTICE to whole genome sequencing, which can detect dozens of times more mutations than WES.8 This may allow for higher resolution mutation signature analysis and more sophisticated tumor genome subtyping even in tumors with a small number of coding mutations.
GS-PRACTICE represents a pan-cancer advancement in both solid tumor diagnostics and precision medicine, as it subtypes tumors by leveraging mutational signatures with defined etiologies, and the subtypes were shown to be indicative of ICI response. The method can be reproducibly applied to WES data derived from FFPE specimens, and thus immediately provide a predictive biomarker for ICI treatment in clinical practice. Future analyses of randomized control trials and whole genome sequencing will spur improved dataset generation for model building, which will subsequently strengthen the clinical utility of the protocol developed herein.
Data availability statement
Data may be obtained from a third party and are not publicly available.
Patient consent for publication
We acknowledge all the researchers and companies for providing the academic and industrial datasets used in this study. Some of these data were originally obtained by Japanese researchers and are available on the website of the National Bioscience Database Center (NBDC).
Contributors Conceptualization: ST, JH, JBB and NM. Data curation: ST, JH. Formal analysis: ST, OG, KM. Funding acquisition: JH, NM. Investigation: ST, JH, KeY, KoY. Methodology: ST, JBB, NM. Project administration: JH, MM. Resources: JH, SM, MM. Software: ST, JBB. Supervision: NM, SM, MM. Visualization: ST, OG. Writing—original draft: ST, JBB, NM. Writing—review and editing: ST, JH, JBB, KeY, KoY, OG, KM, NM. NM is responsible for the overall content as guarantor.
Funding Japan Society for the Promotion of Science (JSPS) KAKENHI Grant 18H02945. Japan Society for the Promotion of Science (JSPS) KAKENHI Grant 18H02947.
Competing interests JH reports grants from Ono Pharmaceutical, Sumitomo Dainippon Pharma, and MSD outside the submitted work. JBB reports a research grant from Daiichi Sankyo unrelated to this work and a potential conflict of interest as a consultant for Boehringer Ingelheim. KeY reports grants from Bayer outside the submitted work. SM reports a potential conflict of interest as a consultant role for Konica Minolta Precision Medicine. MM reports grants and personal fees from Chugai Pharmaceutical; personal fees from Takeda Pharmaceutical and AstraZeneca; and grants from MSD and Tsumura & Co. outside the submitted work. NM reports personal fees from Takara Bio, Takeda Pharmaceutical and AstraZeneca; and grants from AstraZeneca outside the submitted work. No disclosures were reported from the other authors.
Provenance and peer review Not commissioned; externally peer reviewed.
Data and code availability statement Controlled access data used in this study were obtained from the database of Genotypes and Phenotypes (dbGaP), the European Genome-phenome Archive (EGA), and the National Bioscience Database Center (NBDC) with access permissions according to the respective required procedures. The processed data and analysis codes to reproduce the results are available on the GitHub page (https://github.com/shirotak/pancancer_MutSig_ICI). Other codes for preprocessing or restricted-access data are available from the corresponding author upon reasonable request.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.