Article Text

Download PDFPDF

Original research
Development and validation of a genomic mutation signature to predict response to PD-1 inhibitors in non-squamous NSCLC: a multicohort study
  1. Xue Bai1,
  2. De-Hua Wu1,
  3. Si-Cong Ma1,2,
  4. Jian Wang1,
  5. Xin-Ran Tang1,
  6. Shuai Kang2,
  7. Qiang John Fu3,
  8. Chuan-Hui Cao1,
  9. He-San Luo1,
  10. Yu-Han Chen1,
  11. Hong-Bo Zhu2,
  12. Hong-Hong Yan4,
  13. Yi-Long Wu4 and
  14. Zhong-Yi Dong1
  1. 1 Department of Radiation Oncology, Southern Medical University Nanfang Hospital, Guangzhou, Guangdong, China
  2. 2 Hepatology Unit and Department of Infectious Diseases, Southern Medical University Nanfang Hospital, Guangzhou, Guangdong, China
  3. 3 Department of Epidemiology and Biostatistics, Saint Louis University College for Public Health and Social Justice, Saint Louis, Missouri, USA
  4. 4 Guangdong Lung Cancer Institute, Guangdong Provincial People's Hospital and Guangdong Academy of Medical Sciences, Guangzhou, Guangdong, China
  1. Correspondence to Dr Zhong-Yi Dong; dongzy1317{at}


Background Genetic variations of some driver genes in non-small cell lung cancer (NSCLC) had shown potential impact on immune microenvironment and associated with response or resistance to programmed cell death protein 1 (PD-1) blockade immunotherapy. We therefore undertook an exploratory analysis to develop a genomic mutation signature (GMS) and predict the response to anti-PD-(L)1 therapy.

Methods In this multicohort analysis, 316 patients with non-squamous NSCLC treated with anti-PD-(L)1 from three independent cohorts were included in our study. Tumor samples from the patients were molecularly profiled by MSK-IMPACT or whole exome sequencing. We developed a risk model named GMS based on the MSK training cohort (n=123). The predictive model was first validated in the separate internal MSK cohort (n=82) and then validated in an external cohort containing 111 patients from previously published clinical trials.

Results A GMS risk model consisting of eight genes (TP53, KRAS, STK11, EGFR, PTPRD, KMT2C, SMAD4, and HGF) was generated to classify patients into high and low GMS groups in the training cohort. Patients with high GMS in the training cohort had longer progression-free survival (hazard ratio (HR) 0.41, 0.28–0.61, p<0.0001) and overall survival (HR 0.53, 0.32–0.89, p=0.0275) compared with low GMS. We noted equivalent findings in the internal validation cohort and in the external validation cohort. The GMS was demonstrated as an independent predictive factor for anti-PD-(L)1 therapy comparing with tumor mutational burden. Meanwhile, GMS showed undifferentiated predictive value in patients with different clinicopathological features. Notably, both GMS and PD-L1 were independent predictors and demonstrated poorly correlated; inclusion of PD-L1 with GMS further improved the predictive capacity for PD-1 blockade immunotherapy.

Conclusions Our study highlights the potential predictive value of GMS for immunotherapeutic benefit in non-squamous NSCLC. Besides, the combination of GMS and PD-L1 may serve as an optimal partner in guiding treatment decisions for anti-PD-(L)1 based therapy.

  • oncology
  • biomarkers, tumor
  • genetic markers
  • immunotherapy
  • lung neoplasms

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Despite recent years have witnessed considerable successes of immunotherapy with immune-checkpoint inhibitors (ICIs) in treating non-small cell lung cancer (NSCLC), there still existed unneglectable limitations since its unsatisfactory response rate.1 2 Emerging biomarkers have been identified to predict therapeutic benefit, those related to increased expression of tumor specific neo-antigens, such as tumor mutational burden (TMB)3 4 and those indicated inflammatory tumor microenvironment (TME), such as programmed cell death ligand-1 (PD-L1) protein expression on cancer and antigen-presenting cells.5 6 Unfortunately however, drawbacks still remained in current biomarkers. The intra/intertumor heterogeneity, non-standardized cut-off value, unsatisfactory predictive power, and relatively high cost-effectiveness have none the less restricted the clinical application of existing signatures, highlighting the critical need to discover more effective and precise biomarkers.

NSCLC typically harbors higher mutation load which can stimulate stronger antitumor immune response. TMB is thus preferentially capable of predicting clinical benefit of immunotherapy for NSCLC,7 irrespective of PD-L1 expression.3 8 Recent studies also certified the robust association of TMB and response to ICI.9 Despite these discoveries, however, TMB is still imperfect. Instead of focusing on specific mutation itself, TMB simply puts emphasis on the number of mutations in each tumor’s genome. But numerous mutations may exert different, or negative influences on the outcome and some TMB high patients may harbor decisive mutations (STK11/LKB1, B2M, and so on) which have been demonstrated closely related to immunotherapy resistance, leading to their unresponsiveness to ICIs.10 11

We therefore set out a study to build a risk model based on gene-mutation profiles of patients with non-squamous NSCLC treated with anti-PD-(L)1 based therapy to develop a comprehensive molecular signature that can predict immunotherapy responsiveness in patients with NSCLC. Taking all these decisive mutations into consideration, we aimed to establish such a pattern to better cover the shortages of existing biomarkers.


Clinical cohorts

We collated data from previous published clinical cohorts of patients with advanced NSCLC treated with anti-PD-(L)1 based therapy. A total of 240 patients with NSCLC who were treated with anti-PD-(L)1 therapy alone or in combination with anti-CTLA-4 between April 2011 and January 2017 were selected retrospectively from Memorial Sloan Kettering Cancer Center (MSKCC) dataset, among which 205 patients with non-squamous NSCLC. Computer-generated random numbers were used to assign these patients with non-squamous NSCLC (3:2) into a training cohort consisting of 123 patients and an internal validation cohort consisting of 82 patients.

A total of 111 patients with non-squamous NSCLC treated with anti-PD-(L)1 based therapy in previously published clinical trials (CheckMate-012, n=59; KEYNOTE-001, n=30; Stand Up To Cancer, n=22) were identified as external validation cohort. Clinical information and DNA sequencing data were obtained from previously published article.7 12 13 Key variables of these above cohorts including demographic and clinical information have been provided in online supplementary table S1.

Supplemental material

Study design

This multicohort study consisted of a three-step approach (training, internal validation, and external validation) to verify the predictive value of genomic mutation signature (GMS) for PD-(L)1 blockade immunotherapy. We developed the risk model of GMS based on the clinical information and gene mutation profiles of MSK training cohort. The predictive model was first validated in the separate internal MSK cohort and then validated in the external clinical trial cohort. Furthermore, our predictive model was also tested in two independent lung adenocarcinoma cohorts without immune checkpoint inhibitors (ICIs) treatment (TCGA and MSK non-ICI cohort) to confirm whether GMS functioned as a prognostic biomarker in non-squamous NSCLC (online supplementary figure S1).

Supplemental material

Supplemental material

Analysis of mutational data

The MSK cohort was molecularly profiled by MSK-IMPACT Sequencing. The DNA sequencing data were retrieved from the cbioportal ( or previously published studies.14 MSK-IMPACT sequencing, based on the MSK-IMPACT assay, was performed as previously described.15 Germline variants were eliminated through the use of patient-matched DNA extracted from blood. To normalize somatic TMB across panels of various sizes, the total number of mutations was divided by the coding region captured in each panel, which covered 0.98, 1.06, and 1.22 megabases (Mb) in the 341-gene, 410-gene, and 468-gene panels, respectively.

External validation cohorts, including CheckMate-012, KEYNOTE-001, and Stand Up To Cancer, were profiled by whole exome sequencing, which were performed as previously described.12 13 Somatic Single Nucleotide Polymorphims (SNPs) were identified by MuTect, with computational filtering of artifacts introduced by DNA oxidation during sequencing or Formalin Fixed and Parafiin Embedded (FFPE)-based DNA extraction using a filter-based method that mutations with allelic fractions of less than 0.05 or coverage of ≤30× were excluded. Indelocator ( was applied to detect small indels. Annotation of the variants identified was performed using Oncotator (

Mutated genes included in our analysis were restricted to non-silent mutations consisting of nonsense mutation, missense mutation, frameshift mutation, inframe mutation, splice site mutation, translation start site mutation, and nonstop mutation. Some driver mutations, such as EGFR and KRAS, of lung cancer included in our GMS model were defined as genetic mutations in specific exon site. One of the driver mutations, EGFR, consists of mutations in exon 18 to exon 21, while KRAS consists of mutations in exon 2 and exon 3.


Outcomes of this study included progression-free survival (PFS), overall survival (OS), and objective response rate (ORR). PFS was defined as the time from the start of anti-PD-(L)1 treatment to the first occurrence of a progression event according to Response Evaluation Criteria in Solid Tumors (RECIST) V.1.1 or death. OS was defined as the time from the start of anti-PD-(L)1 treatment to death or the date of the last follow-up. ORR was assessed by investigator-assessed RECIST 1.1, efficacy also was defined as durable clinical benefit (DCB: complete response (CR)/partial response (PR) or stable disease (SD) that lasted >6 months) or no durable benefit (PD or SD that lasted ≤6 months).

Statistical analysis

Statistical analyses were conducted using GraphPad Prism (V.7.01), R, and SPSS V.22.0 (SPSS). We used R programming language for χ² test of the correlation between individual gene mutation frequencies and the ORR in MSK cohorts. Maximally selected rank statistics was used to determine the optimal cut-point for GMS, which was performed by the survminer R package.16 PFS and OS were calculated using the Kaplan-Meier method and the log-rank test, and HRs were calculated using a univariate Cox regression analysis. We adopted multivariate Cox regression analysis using backward selection to test the independent significance of different genes, and the p value threshold was 0.15 (p>0.15) for removing non-significant variables from the analysis. A GMS risk model consisted of these selected genes was developed based on the multivariable Cox regression analysis. ORRs in different subgroups based on GMS and PD-L1 were analyzed using a χ² test. Forest plots of HRs for PFS in subgroup analysis comparing GMS-high with GMS-low were performed by using Review Manager V.5.3 (RevMan, Cochrane Collaboration, Oxford, England).


Development of genomic mutation signature (GMS)

To define a GMS associated with patients who would gain clinical benefit from anti-PD-(L)1 based therapy, we studied 205 MSK patients with non-squamous NSCLC and classified them into training (n=123) and internal validation (n=82) cohorts. We first performed a χ² test of the correlation between each gene mutation and the ORR of patients treated with anti-PD-(L)1 based therapy in the MSK training cohort. Thus, we screened 18 candidate genes potentially related to the response of anti-PD-(L)1 therapy according to the criteria of χ² test p≤0.15 and gene mutation frequency ≥5%, on the basis of their potential biological significance and detecting power in the analysis which was indicated in the previous study17 and reducing bias resulted from rare mutations (online supplementary table S2). Then, a multivariable Cox regression analysis of these candidate mutation genes was performed for the PFS in the MSK training cohort. Finally, eight genes (TP53, KRAS, STK11, EGFR, PTPRD, MLL3/KMT2C, SMAD4, and HGF) were identified associated with PFS independently in the training cohort (table 1), with no significant difference observed in mutation frequency among cohorts (online supplementary table S3). A risk model defined as GMS was calculated for each patient using a formula derived from the mutation status (1 or 0) of these eight genes weighted by their regression coefficient.

Supplemental material

Supplemental material

Table 1

Multivariable Cox regression analysis of candidate mutation genes in training cohort for progression-free survival

GMS score = (0.320×TP53) + (0.359×KRAS) – (0.459×EGFR) – (0.398×STK11) + (0.413×PTPRD) + (0.643×MLL3/KMT2C) – (1.156×SMAD4) + (1.091×HGF).

We assessed the association of clinical and molecular features including individual genes, GMS, and other known biomarkers (TMB and PD-L1) with response to anti-PD-(L)1 based immunotherapy. We observed that single mutation from the primary 18 candidate genes presented comparatively dispersive associations with response to ICI treatment, suggesting single mutation was not sufficient to define the whole landscape of antitumor immune response, while the GMS risk score stood comparison with the known biomarkers as TMB and PD-L1 (figure 1).

Figure 1

Summary of clinical and molecular features associated with response of anti-PD-(L)1 based therapy in three cohorts with non-squamous NSCLC. Individual patients are represented in each column, sorted by progression-free survival time and treatment response (CR/PR or SD/PD). PD-L1 expression is stratified as 0%, 1%–49%, or ≥50%. NsM or mutations/megabase, and GMS score from each cohort are shown in histograms. Categories of smoking status (never or ever) and clinical benefit (DCB or NCB) are characterized. The occurrences of selected genes in each case are represented in the OncoPrint. CR, complete response; DCB, durable clinical benefit; GMS, genomic mutation signature; NDB, no durable benefit; NSCLC, non-small cell lung cancer; NsM, nonsynonymous mutations; PD-(L)1, programmed cell death (ligand)1; PR, partial response; SD, stable disease; TMB, tumor mutation burden.

We next used the maximally selected rank statistics based on GMS score and PFS to determine the optimal cut-point to separate patients into GMS-high (>0.565) and GMS-low (≤0.565) groups in the training cohort (figure 2A). The GMS model classified 37 (30.1%) of 123 patients into the GMS-high group and 86 (69.9%) patients into the GMS-low group in the training cohort. With the help of GMS, we observed a general pattern of favorable objective response (PR/CR) in patients who had higher GMS score (online supplementary figure S2A), and Kaplan-Meier survival curves showed patients with higher GMS obtained significant longer PFS (6.63 vs 2.50 months, HR 0.41, 0.28–0.61, p<0.0001, figure 2B) and OS (not reached vs 13.00 months, HR 0.53, 0.32–0.89, p=0.0275; figure 2C) than those with lower GMS in the training cohort.

Supplemental material

Figure 2

GMS in MSK training and internal validation cohort of patients with non-squamous NSCLC treated with anti-PD-(L)1 based immunotherapy. (A) Survminer R package determine the optimal cut-point to separate patients into GMS-high and GMS-low groups based on GMS score and progression-free survival in the training cohort. Kaplan-Meier estimates of (B) progression-free survival and (C) overall survival according to GMS status in MSK training cohort. Kaplan-Meier estimates of (D) progression-free survival and (E) overall survival according to GMS status in MSK internal validation cohort. GMS, genomic mutation signature; MSK, Memorial Sloan Kettering; NSCLC, non-small cell lung cancer.

Validation of the predictive value of GMS for anti-PD-(L)1 therapy

To prove the generalization of GMS in predicting the efficacy of anti-PD-(L)1 therapy, we first validated the predictive value of GMS in the internal validation cohort. Using the optimal cut-off value determined in the training cohort, the MSK internal validation cohort was separated into 29.3% GMS-high and 70.7% GMS-low patients. A general pattern of favorable objective response (PR/CR) was observed in patients who had higher GMS score (online supplementary figure S2B). Kaplan-Meier survival curves showed patients with higher GMS obtained significant longer PFS (PFS: 9.20 vs 2.67 months, HR 0.42, 0.25–0.69, p=0.0018, figure 2D) and OS (not reached vs 14.0 months, HR 0.26, 0.13–0.54, p=0.0057, figure 2E) than those with lower GMS.

To further confirm the predictive value of GMS for the response of immunotherapy across cohorts, we validated the risk model in previously published clinical trial cohorts. In the external validation cohort, the GMS model discriminated 34 (30.6%) patients from 111 ones into the GMS-high group and 77 (69.4%) patients into the GMS-low group. Patients with higher GMS score were prone to obtain favorable objective response (figure 3A), and notably, patients in GMS-high group demonstrated favorable PFS than those in GMS-low group (22.14 vs 4.11 months, HR 0.31, 0.20–0.48, p<0.0001; figure 3B).

Figure 3

GMS analysis in external validation cohort of patients with non-squamous NSCLC treated with anti-PD-(L)1 therapy or without the treatment of immune checkpoint inhibitors. (A) The association of GMS score with PFS time and objective response rate in external validation cohort. Vertical and horizontal dashed lines were indicated as GMS cut-point (0.565) and PFS≥6 months (durable clinical benefit), respectively. (B) Kaplan-Meier estimates of PFS by GMS in external validation cohorts treated with anti-PD-(L)1. Kaplan-Meier estimates of overall survival by GMS in (C) MSK and (D) TCGA lung adenocarcinoma cohorts without the treatment of immune checkpoint inhibitors. GMS, genomic mutation signature; MSK, Memorial Sloan Kettering; NSCLC, non-small cell lung cancer; PFS, progression-free survival; TCGA, The Cancer Genome Atlas.

Predictive and prognostic role of GMS in non-squamous NSCLC

We next verified whether GMS was an independent predictor for immunotherapy. Univariable and multivariable analysis of clinicopathological factors in the MSK cohort (n=205) for PFS and OS were performed. The GMS, TMB, and smoking status were correlated with PFS in univariable analysis. After multivariable adjustment by clinicopathological variables, GMS remained the exclusively independent predictive factor for both PFS (HR 0.39, 0.27–0.56; p<0.001) and OS (HR 0.43, 0.26–0.71; p=0.001) (table 2).

Table 2

Univariate and multivariable Cox regression analysis of predictive factors in MSK cohort (n=205) treated with anti-PD-(L)1 for progression-free survival and overall survival

Since the GMS risk model consisted of several driver genes of lung cancer, we also investigated the prognostic role of GMS in two lung adenocarcinoma (MSK non-ICI and The Cancer Genome Atlas (TCGA)) cohorts without the treatment of immune checkpoint inhibitors. There was no significant difference between GMS-high and GMS-low group in both the MSK non-ICI cohort, which consisted of patients with stage IV cancer (n=1105, HR 0.82, 0.62–1.10; p=0.1663, figure 3C), and the TCGA cohort, which mainly contained patients with early stage cancer (n=520, HR 1.16, 0.84–1.60; p=0.3711, figure 3D), suggesting that GMS functioned as a predictive biomarker but not a prognostic biomarker.

Applicability of GMS in patients with different clinical characteristics

We next investigated whether GMS risk model was applicable in different populations or restricted to certain specific groups. A univariate subgroup analysis of PFS was performed in the three cohorts according to baseline clinical and biological characteristics. High GMS score was significantly associated with increased PFS in patients regardless of their age (≤65 vs >65) and gender (male vs female). Meanwhile, GMS showed undifferentiated predictive value for anti-PD-(L)1 therapy in patients regardless of their treatment lines (1st vs ≥2nd), treatment types (monotherapy vs combination treatment) and whether the patients were included in the clinical trials or not. More importantly, we also identified that GMS could be served as an effective predictor in patients independent of their PD-L1 expression level (≥50% vs 1%–49% vs 0), suggesting GMS was a potential biomarker for anti-PD-(L)1 therapy in non-squamous NSCLC with different clinical characteristics (figure 4).

Figure 4

Subgroup analysis of GMS for progression-free survival from three cohorts according to baseline clinicopathological characteristics. HR of each subgroup was obtained from univariate analysis. Pooled HRs for each subgroup was computed using fixed-effects model. The bars indicate 95% CI. GMS, genomic mutation signature; PD-(L)1, PD-L1, programmed cell death ligand-1.

The significance of combining GMS with PD-L1 for the prediction of anti-PD-(L)1 therapy

Recent studies have highlighted the relevance of TMB or PD-L1 with response to PD-1 blockade immunotherapy. We collected whole data from the patients in three cohorts with valid TMB, PD-L1, and GMS results. There were 156 non-squamous NSCLC with complete data included for the univariable and multivariable analysis of PFS. We observed that GMS, TMB, PD-L1, and smoking status equally served as effective predictors for PFS in the univariable analysis. However, multivariable analysis demonstrated that GMS (HR 0.30, 0.19–0.49; p<0.0001) and PD-L1 (HR 0.64, 0.44–0.93; p=0.020) were two independent predictors for anti-PD-(L)1 treatment (table 3).

Table 3

Univariate and multivariable Cox regression analysis of progression-free survival in three anti-PD-(L)1 treated cohorts with valid TMB, PD-L1, and GMS data (n=156)

Given that GMS and PD-L1 were independent predictors, we next investigated the correlation between these two biomarkers. As expected, we found there were few patients overlapped when GMS>0.565 (cut-point) and PD-L1 ≥50%, indicating these two biomarkers were poorly correlated (online supplementary figure S3A). Based on these above findings, we speculated the combination of GMS and PD-L1 might function as synergistic factors for the prediction of immunotherapy. Kaplan-Meier analysis of MSK cohort revealed patients with dual positive of GMS and PD-L1 (GMShighPD-L1high: GMS>0.565, PD-L1 ≥1%) obtained favorable PFS than either single positive (GMShigh or PD-L1high) or dual negative (GMSlowPD-L1low, 14.50 vs 5.47 vs 2.10 months, p<0.0001, figure 5A). Since there was no significant difference between GMShighPD-L1low and GMSlowPD-L1high (online supplementary figure S4) and individual group would have lesser sample size if they were taken apart, we combined these two situations together as GMShigh or PD-L1high group. Analysis of objective response also supported that GMShighPD-L1high subgroup represented increased proportion of PR/CR than that of other two groups (50% vs 22.2% vs 3.2%, p=0.001, figure 5C). The GMShighPD-L1high group also had significantly prolonged PFS compared with other two subgroups in external validation cohorts (22.14 vs 6.50 vs 5.09 months, p=0.0012, figure 5B). Similarly, increased proportion of PR/CR was observed in the GMShighPD-L1high group than the other two groups in the external validation cohort (62.5% vs 30.3% vs 13.6%, p=0.002, figure 5D). These findings suggested that the inclusion of PD-L1 with GMS might further improve the predictive value for the response to anti-PD-(L)1 based therapy.

Supplemental material

Supplemental material

Figure 5

Association of a combination of GMS and PD-L1 with response to anti-PD-(L)1 therapy. Kaplan-Meier estimates of progression-free survival classified by the status of GMS and PD-L1 in (A) MSK cohort and (B) external validation cohorts. Proportional representation of objective response rate among subgroups categorized by GMS and PD-L1 in (C) MSK cohort and (D) combined external validation cohorts. GMS, genomic mutation signature; MSK, Memorial Sloan Kettering; PD-L1, programmed cell death ligand-1.


Cancer immunotherapy, especially ICIs, is epoch-making, but only a proportion of patients can derive durable benefits from it; more accurate biomarkers are thus highly needed. In this multicohort retrospective study, we developed and validated a novel tool consisting of eight genes which can better predict the efficacy of PD-1 blockade immunotherapy in non-squamous NSCLC. Our study demonstrated that GMS can stratify patients into benefited subgroups, who had significant PFS and OS advantage, and non-benefited ones, in three independent ICIs-treated cohorts. It is also worth noting that instead of whole genome assay, our GMS may significantly decrease the cost-effectiveness by simplifying the test into a small panel of eight decisive common genes, which makes it easier to implement in clinical praxis, allowing for a more affordable and large-scale approach.

Emerging evidence has indicated that carcinogenesis driver genes can modulate tumor immune milieu, especially in lung adenocarcinoma.18–21 Nevertheless, single alteration may not be sufficient enough to turn the tide and thus is likely not a sufficiently comprehensive standalone biomarker for ICIs. For example, TP53 alteration significantly increased PD-L1 expression and immune cytolytic activity related genes, but the mutation alone was not able to distinguish sensitive patients from receiving immunotherapy in lung adenocarcinoma, however those with co-occurring TP53/KRAS responded remarkably well to PD-1 blockade therapy,22 indicating a necessity of a model combining different genes together. We therefore established a model to include most of the possible decisive genes together, and found eight genes that can effectively distinguish sensitive patients. Each of these genes may exert impact on the immune contexture or immune response in some way; thus, our signature has biological meaning and theoretical rationality. In addition to TP53/KRAS, EGFR mutation, which occurred nearly exclusively in adenocarcinoma, is associated with an uninflamed TME and poor immunogenicity, resulting in impaired response to PD-1 inhibitors,22 while STK11 alteration, correlated with a “cold” immune microenvironment, has been acknowledged as the most prevalent driver gene for primary resistance to PD-1 blockade immunotherapy in KRAS-mutant lung adenocarcinoma.10 Additionally, SMAD4 mutation is associated with upregulation of extracellular matrix genes in cancer, which is linked to cancer immune evasion through TGF-β signaling activation.23 As a result, our GMS holds promise in predicting immunotherapy efficacy since almost each of these eight genes might indicate underlying tumor-immune interactions.

Preferable immunotherapy responses tended to be observed in carcinogen-driven cancer, such as melanoma and NSCLC,24 both of them typically possessing higher mutational load, and TMB is thus a particularly suitable biomarker for NSCLC. But it still has limitations; there are still some patients with high TMB who do not respond and vice versa. One of the important reasons is that TMB only focuses on the quantity of mutations, but functional heterogeneity of gene alterations potentially shaping tumor-immune landscape has been diluted. Common mutated genes, including activation of oncogenes and loss of tumor suppressor genes, may influence the recruitment, activation or restraining the immune milieu,25 as we mentioned above, STK11 alteration may lead to primary resistance to ICIs.10 Additionally, gene defects in the pathways involved in interferon-receptor signaling and in antigen presentation, such as JAK1, JAK2, IFNGR1, and B2M, were found to be associated with acquired resistance to ICIs.11 26 Carrying these mutations, even though the TMB may be high, patients tend to respond poorly to ICIs. As a result, roughly determine the predictive effect on the overall burden of mutations is far from sufficient. Further, we recognized that GMS was positively correlated with TMB in the MSK (R2=0.230, p<0.0001, online supplementary figure 5A) and the external validation cohorts (R2=0.264, p<0.0001, online supplementary figure 5B). Besides, 51 patients overlapped when TMB ≥upper quantile (64.6%) and GMS>0.565 (54.3%), which demonstrated high correlation between them (online supplementary figure 3B). In addition, GMS rather than TMB is an independent predictor in our multivariate analysis, indicating GMS might in some way cover TMB with more predictive power. We had also checked the possibility of combining TMB with GMS together to predict the efficacy and found out that patients with GMShighTMBhigh did not obtain favorable PFS than patients with GMShighTMBlow (p=0.7971) (online supplementary figure 6A). Further, adding TMB to the combination model of GMS and PD-L1 might bring relatively limited improvement (online supplementary figure 6B).

Supplemental material

Supplemental material

Importantly, our risk model holds great promise by its broad applicability; GMS showed equivalent predictive value for anti-PD-(L1) therapy in the cohorts regardless of the treatment lines (1st vs ≥2nd) and treatment types (monotherapy vs combination treatment). However, PD-L1 protein expression is flawed as a biomarker in this regard, although it is applicable for efficacy predicting of anti-PD-(L)1 monotherapy; for example, pembrolizumab is approved for patients with NSCLC who are PD-L1 positive; it notoriously lacks predictive power for response to combination immunotherapy.3 Taking this step further, we also noticed that GMS was able to be proposed as an effective predictor in patients regardless of their PD-L1 expression level, suggesting that it can discriminate potential sensitive patients with low to no detectable PD-L1 expression who may experience DCB, making it a very meaningful work. In addition, we also explored the stability of our GMS model. We added the analysis of selecting the optimal cut-off value by ROC curve based on GMS scores and ORR. By calculating Youden index, the optimal cut-off value selected by ROC was 0.609 (online supplementary figure S7A). Only one patient and two patients changed their GMS group in the the training cohort and the internal validation cohort, respectively, while no difference was observed in the external validation cohort. We used the optimal cut-off value obtained by ROC to do the survival analysis in MSK cohorts and the results remained consistent (online supplementary figure S7B–F), indicating the robustness of our model from another point of view.

Supplemental material

The parsimony and sensitivity of the GMS model was explored in our study. We limited the criterion for gene selection in multivariate Cox regression analysis to p≤0.10 and p≤0.05, respectively, where three GMS genes (MLL3/KMT2C, SMAD4, and HGF) were included for modeling in both cases. We also developed a model based on four well-established genes of GMS (TP53, KRAS, EGFR, and STK11). Nonetheless, neither of the two models was capable to predict both PFS and OS in the three cohorts (online supplementary figure S8). Furthermore, we demonstrate that GMS (the eight-gene model) outperforms any other models regardless of cohorts according to HR (online supplementary table S4). In addition, to prove the non-randomness of GMS, thousands of randomly selected eight-gene models were established through computer simulation. We found that only around 5% (online supplementary table S5) of them were eligible, while those outperforming GMS were even rare (online supplementary table S6). These results indicate that our GMS model is non-random and irreplaceable, of which every gene plays a crucial role in predicting survival.

Supplemental material

Supplemental material

Supplemental material

Several limitations should be addressed in our study. First, we recognized that low GMS group still had CR/PR cases, which indicated that our model still was not able to precisely select all the responsive patients and non-responders from anti-PD-(L)1 therapy. However, intriguingly, when we looked into all the CR/PR cases in the MSK and the external validation cohorts and discriminated their PFS data between GMShigh and GMSlow groups, we discovered that GMShigh group still outperformed with longer PFS, which implied that patients with GMShigh were prone to have durable response (online supplementary figure S9). Nonetheless, there are still numerous cellular and molecular mechanisms involved in immunotherapy, for example, PD-L1 expression, that our GMS model may not cover all the possible situations in antitumor immune responses. Notably, we demonstrated that it is reasonable to combine GMS with PD-L1 together and thus it might help make clinical decisions in NSCLC. Patients with GMShighPD-L1high should be preferentially recommended for anti-PD-(L)1 therapy, while patients with GMShighPD-L1low or GMSlowPD-L1high can optionally consider anti-PD-(L)1 therapy; however patients with GMSlowPD-L1low should carefully choose anti-PD-(L)1 therapy. Nevertheless, this GMS model and the clinical recommendation were still based on small cohorts and retrospective analysis. Heterogeneity existed among our cohorts. Most patients in the external validation cohort received first-line combination therapy, whereas the majority of MSK patients received second-line treatment with anti-PD-1 monotherapy. There also existed internal heterogeneity in our external validation cohorts, composed by three different cohorts with low number of patients, different mutation panels, and so on. Therefore, further validation based on large-cohort, independent, prospective clinical trials are warranted in the future.

Supplemental material


Ultimately, our study demonstrated that GMS might be an optional and cost-effective tool to predict responses to PD-1 blockade therapy in patients with non-squamous NSCLC. Our quantitative model may enable rational, biology-driven personalized immunotherapy and is simple to be implemented in the real-world setting. Besides, this approach may also offer a research framework for constructing and evaluating response biomarkers of ICIs based therapy in other tumor types.



  • XB, D-HW and S-CM contributed equally.

  • Correction notice This article has been corrected since it was published Online First. The title has been updated as the last letter of 'study' was missing. Additionally the tables were rearranged as they were in the incorrect order.

  • Contributors Study design: Z-YD, Y-LW, D-HW. Data collection: Z-YD, XB, JW, X-RT. Data analysis and interpretation: Z-YD, XB, S-CM, X-RT, SK, C-HC, H-SL, Y-HC, H-BZ. Writing of the manuscript: Z-YD, XB. Revision of the manuscript: Y-LW, D-HW, S-CM, JW, X-RT, SK, H-BZ. Statistical analysis: QJF, H-HY, X-RT. All authors have reviewed the manuscript and approved the final version.

  • Funding This study was supported by the National Natural Science Foundation for Young Scientists of China (Grant No. 81802863 and 81902353), the Natural Science Foundation of Guangdong Province (Grant No. 2018030310285), the Outstanding Youths Development Scheme of Nanfang Hospital, Southern Medical University (Grant No. 2017J003) and the General Project of Postdoctoral Science Foundation of China (2019M662996).

  • Competing interests None declared.

  • Patient consent for publication Obtained.

  • Ethics approval This study was approved by the Institutional Ethical Review Boards of Nanfang Hospital. Patients included in three cohorts provided signed informed consent in accordance with their clinical study protocols.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information. The DNA sequencing data and clinical information of MSK cohort were retrieved from the cBioPortal for Cancer Genomics ( and external validation cohort were extracted from the supplementary materials of previously published studies as described in Methods section.