Article Text

Original research
Machine learning analysis of pathological images to predict 1-year progression-free survival of immunotherapy in patients with small-cell lung cancer
  1. Ryota Shibaki1,
  2. Daichi Fujimoto1,
  3. Tsukasa Nozawa2,
  4. Akira Sano2,
  5. Yuka Kitamura3,
  6. Junya Fukuoka3,
  7. Yuki Sato4,
  8. Takashi Kijima5,
  9. Hirotaka Matsumoto6,
  10. Toshihide Yokoyama7,
  11. Satoru Miura8,
  12. Akito Hata9,
  13. Motohiro Tamiya10,
  14. Yoshihiko Taniguchi11,
  15. Jun Sugisaka12,
  16. Naoki Furuya13,
  17. Hisashi Tanaka14,
  18. Nobuyuki Yamamoto1,15,
  19. Yasuhiro Koh1,15 and
  20. Hiroaki Akamatsu1
  1. 1Internal Medicine Ⅲ, Wakayama Medical University, Wakayama, Japan
  2. 2ExaWizards Inc, Tokyo, Japan
  3. 3Department of pathology informatics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
  4. 4Department of Respiratory Medicine, Kobe City Medical Center General Hospital, Hyogo, Japan
  5. 5Department of Respiratory Medicine and Hematology, Hyogo Medical University, Hyogo, Japan
  6. 6Department of Respiratory Medicine, Hyogo Prefectural Amagasaki General Medical Center, Hyogo, Japan
  7. 7Department of Respiratory Medicine, Kurashiki Central Hospital, Okayama, Japan
  8. 8Department of Internal Medicine, Niigata Cancer Center Hospital, Niigata, Japan
  9. 9Division of Thoracic Oncology, Kobe Minimally Invasive Cancer Center, Hyogo, Japan
  10. 10Department of Thoracic Oncology, Osaka International Cancer Institute, Osaka, Japan
  11. 11Department of Internal Medicine, NHO Kinki Chuo Chest Medical Center, Osaka, Japan
  12. 12Department of Pulmonary Medicine, Sendai Kousei Hospital, Miyagi, Japan
  13. 13Division of Respiratory Medicine, Department of Internal Medicine, St. Marianna University School of Medicine, Kanagawa, Japan
  14. 14Department of Respiratory Medicine, Hirosaki University Graduate School of Medicine, Hirosaki, Aomori, Japan
  15. 15Center for Biomedical Sciences, Wakayama Medical University, Wakayama, Japan
  1. Correspondence to Dr Daichi Fujimoto; daichi{at}wakayama-med.ac.jp

Abstract

Background In small-cell lung cancer (SCLC), the tumor immune microenvironment (TIME) could be a promising biomarker for immunotherapy, but objectively evaluating TIME remains challenging. Hence, we aimed to develop a predictive biomarker of immunotherapy efficacy through a machine learning analysis of the TIME.

Methods We conducted a biomarker analysis in a prospective study of patients with extensive-stage SCLC who received chemoimmunotherapy as the first-line treatment. We trained a model to predict 1-year progression-free survival (PFS) using pathological images (H&E, programmed cell death-ligand 1 (PD-L1), and double immunohistochemical assay (cluster of differentiation 8 (CD8) and forkhead box P3 (FoxP3)) and patient information. The primary outcome was the mean area under the curve (AUC) of machine learning models in predicting the 1-year PFS.

Results We analyzed 100,544 patches of pathological images from 78 patients. The mean AUC values of patient information, pathological image, and combined models were 0.789 (range 0.571–0.982), 0.782 (range 0.750–0.911), and 0.868 (range 0.786–0.929), respectively. The PFS was longer in the high efficacy group than in the low efficacy group in all three models (patient information model, HR 0.468, 95% CI 0.287 to 0.762; pathological image model, HR 0.334, 95% CI 0.117 to 0.628; combined model, HR 0.353, 95% CI 0.195 to 0.637). The machine learning analysis of the TIME had better accuracy than the human count evaluations (AUC of human count, CD8-positive lymphocyte: 0.681, FoxP3-positive lymphocytes: 0.626, PD-L1 score: 0.567).

Conclusions The spatial analysis of the TIME using machine learning predicted the immunotherapy efficacy in patients with SCLC, thus supporting its role as an immunotherapy biomarker.

  • Lung Neoplasms
  • Computational Biology
  • Immune Checkpoint Inhibitors
  • Tumor Biomarkers
  • Tumor Microenvironment

Data availability statement

Data are available on reasonable request. The corresponding author declares that he had full access to all the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. The data supporting the findings of this study are available from the corresponding author on reasonable request. The data are not publicly available due to privacy and ethical restrictions.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • There are no available definitive biomarkers for predicting the efficacy of immunotherapy in small-cell lung cancer (SCLC). Although the tumor immune microenvironment (TIME) influences the efficacy of the immunotherapy, performing an objective spatial analysis of the TIME remains challenging.

WHAT THIS STUDY ADDS

  • The TIME is a potential source of biomarkers for immunotherapy efficacy according to spatial analysis in SCLC, which currently has no established biomarkers.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • This study indicates that evaluating the TIME of pathological images is important for predicting the efficacy of immunotherapy. Moreover, compared with manual count assessments, machine learning analysis offers improved objective spatial analysis, and a spatial analysis of the TIME is essential for enhancing the reliability of a biomarker in clinical practice.

Introduction

Lung cancer is one of the primary causes of cancer-related deaths.1 Small-cell lung cancer (SCLC), which accounts for 15% of all lung cancer cases, is characterized by rapid proliferation, early development of widespread metastases, and poor survival rates.2 3 Combining immunotherapy with chemotherapy has durable clinical activity and manageable safety.4 5 Hence, chemoimmunotherapy is recommended as first-line treatment for extensive-stage (ES) SCLC.

Notably, long-term efficacy is a hallmark feature of immunotherapy; however, a 1-year progression-free survival (PFS) has only been observed in 10%–20% of patients with SCLC.4 5 Thus, the development of biomarkers for predicting the efficacy of immunotherapy is required as no definitive biomarkers have been established for SCLC.6 7 Recently, the concept of the tumor immune microenvironment (TIME), where the activation and suppression of lymphocytes and their localization with the tumor take place, has attracted attention as a biomarker for immunotherapy.8–11 The localization of immune cells (cluster of differentiation 8 (CD8)-positive cells and forkhead box P3 (FoxP3)-positive cells) reflects the host antitumor immunity status.12–14

Several important clinical issues should be considered when evaluating the TIME. The spatial analysis of the relationship between inflammatory cells and the tumor is important for the accurate evaluation of the TIME. However, due to the complexity of interactions between various inflammatory cells and tumor cells, it is difficult to determine which interactions are important for enhancing the immunotherapeutic efficacy. Furthermore, objectively evaluating such spatial analysis remains challenging. Recently, machine learning has attracted significant attention for standardization, reproducibility, and improving accuracy.15 16 Therefore, to develop an objective biomarker for assessing the efficacy of immunotherapy using the TIME, the precise and objective spatial analysis of the complex positioning of inflammatory and tumor cells should be performed using machine learning. Another issue in examining the usefulness of the TIME as a biomarker in clinical practice is that the patient’s background affects the TIME and the efficacy of immunotherapy17 18; thus, it is necessary to determine whether the TIME status is a more accurate predictor of immunotherapeutic efficacy than patient information through the conduct of prospective cohort studies that involve the collection of high-quality patient information. Therefore, we aimed to investigate the feasibility of developing biomarkers for determining the efficacy of immunotherapy in prospective cohorts of ES-SCLC patients through a machine learning analysis of the TIME and patient information.

Methods

Patients

This 32-centered, prospective, and hospital-based cohort study was a biomarker analysis of the APOLLO study, using patients with ES-SCLC who received carboplatin and etoposide with atezolizumab as first-line treatment in Japan between September 2019 and September 2020.19 In addition to the eligibility criteria in the APOLLO study, patients in whom adequate biopsy specimens were obtained before initiating the first-line treatment were included. We only used biopsy samples from the lungs to accurately evaluate the TIME status. The cut-off date for data collection was September 2021.

The study was registered in the University Medical Information Network Clinical Trials Registry (UMIN 000039600).

Processing of pathological images

The tissue samples were fixed in 10% formalin and embedded in paraffin. Immunohistochemical analysis was performed on 4 µm thick, formalin-fixed, paraffin-embedded tissue sections. Each section was deparaffinized in xylene, rehydrated, and incubated in 0.03% H2O2 in 95% methanol for 10 min. All samples were stained with H&E, programmed cell death-ligand 1 (PD-L1, Dako PD-L1 immunohistochemistry (IHC) 28-8 pharmDx assay), and chromogen-based double immunohistochemical assay (CD8 (clone: C8/144B) and FoxP3 (clone: 236A/E7)) based on the results of recent studies.13 20–22 For double staining using primary antibodies from the same species, inactivation of the primary antibody and linked enzymes used was performed. Slides after the first staining with AEC (3-amino-9-ethylcarbazole) were washed in phosphate-buffered saline for 5 min, autoclaved at 110°C for 10 min in the epitope retrieval solution for the second antigen, and the second staining with DAB (diaminobenzidine) was carried out. After staining, sections were counterstained with Myer’s hematoxylin, dehydrated with alcohol, penetrated by xylene, and permanently mounted. Representative images of H&E, PD-L1, and double IHC are provided in online supplemental figure 1. All slides were scanned using a whole-slide imaging scanner. The assigned pathologists (YKitamura and JF) manually annotated the tumor area in all samples on the H&E staining images to assess the relationship between each inflammatory cell and the tumor. In CD8 and Foxp3 multiplex staining slides, the number of each type of infiltrating immune cell was counted in up to four 1 mm2 fields, and the mean tumor infiltration lymphocyte (TIL) count was recorded.12 13 The tumor cell PD-L1 expression was defined as the percentage of viable tumor cells with partial or complete membrane staining in at least 100 viable tumor cells. The Combined Positive Score (CPS) was generated by scoring the PD-L1-stained slides using the CPS algorithm23; it was calculated by dividing the number of PD-L1-positive tumor cells, lymphocytes, and macrophages by the total number of viable tumor cells and multiplying the results by 100.

Supplemental material

Development of the predictive model

We aimed to develop a predictive model for 1-year PFS using a machine learning algorithm. In previous phase III trials,4 5 the 1-year PFS rate was 5.3%–5.4% in the chemotherapy group, compared with 12.6%–17.9% in the chemoimmunotherapy group. Therefore, the implication is that most of the patients who achieved 1-year PFS did so due to immune checkpoint immunotherapy. A two-class classifier that predicts the 1-year PFS was trained using the whole-slide pathological images of H&E, PD-L1 expression, double IHC (CD8 and FoxP3), and patient information (figure 1). The predictive model was developed in three steps: “preprocessing of patch image dataset (figure 1A,B),” creation of a “patch-based pipeline (figure 1C,D),” and completion of “per-patient pipeline (figure 1E,F).” During the model development, our network was trained using a fivefold cross-validation approach that involved randomly splitting the full set of our patch image dataset into five balanced subsets (figure 1G). The balanced subsets do not refer to a simple random split. Instead, we divided all patients into five groups so that the ratio of 1-year PFS responders and non-responders was equal. Then, patches were generated from each patient and used in the analyses, so that there was no leakage. After assessing the predictive model, the coordinates for the optimal threshold were identified on the receiver operating characteristic (ROC) curves from each fold. The Youden index from the optimal point on the ROC curve (distance from 0 to 1) was used to determine the optimal threshold values, which were then set as cut-off points for stratifying patients into high or low efficacy groups.

Figure 1

Overview of the predictive model for the achievement of 1-year progression-free survival (PFS). (A) HE, PD-L1, and double immunohistochemistry (CD8 and FoxP3) analyses were performed in all samples. The green contour represents the annotation of the tumor area. (B) Patches from the same position within the annotated area. The patch size had a height of 128 pixels and a width of a patch size. (C) A pretrained EfficientNet-B0 model was used. (D) A patch-based pipeline predicted the confidence score for 1-year PFS for each patch. (E) Patient information combined the histogram confidence score calculated from the patch-based pipeline. (F) The patient-based classifier, which uses LightGBM, predicts the confidence score for 1-year PFS. (G) The overall patient data were divided into five randomly chosen folds of equal size. One-fold was used to validate the model trained using the remaining folds. (H) The overall patients (including the training and validation cohorts) were classified into high and low efficacy groups using the predictive model from each fold. BatchNorm, batch normalization layer; CD8, cluster of differentiation 8; Conv, convolution layer; FC, fully connected layer; LeakyReLU, leaky rectified linear unit layer; MBConv, Mobile Inverted Bottleneck Convolution Layer; PD-L1, programmed cell death ligand 1; Train, training cohort; Val, validation cohort.

Preprocessing of the patch image dataset

For preprocessing, the three types of whole-slide images were processed. The tumor area annotation was transferred from the H&E staining images to the other types of images. Since each whole-slide image is not in the same position, it is impossible to use the same annotation directly. In order to avoid this problem, we applied the following two processes for the image alignment process. First, the same tumor region between each image was manually annotated. Second, a non-rigid image registration framework was constructed using the “Insight Segmentation and Registration Toolkit24” (figure 1A). Next, we eliminated the background by setting a threshold for white color. In the final step, we divided the inside of the annotation into 128-pixel square images without overlap (figure 1B). By going through the process up to this point, N patch images of 128 pixels in height and width are generated for each patient. N represents the total number of patches and has a different number for each patient.

Patch-based pipeline

In creating the patch-based pipeline step, the confidence score for the 1-year PFS for each patch was predicted. The creation of a patch-based pipeline involved two phases: “feature extraction module” (figure 1C) and “patch-based classifier” (figure 1D). We used EfficientNet,25 a widely used architecture for image recognition tasks, as a feature extraction module. Specifically, a pretrained EfficientNet-B0 model that was trained on the ImageNet dataset was used.26 Since we wanted to use image features that were learned on a large variety of images directly, retraining and fine-tuning of the EfficientNet model were not performed. After completing the “feature extraction module,” the features consisting of 1280 dimensions were obtained for each patch. To calculate the final score based on the features, a patch-based classifier consisting of three fully connected layers was constructed. First, three features were combined as the input of the patch-based classifier. Then, 3840 dimensional features were used as the input, and a two-class classification was performed. The estimated class and target class were determined using the sum of the cross-entropy loss and were similar for all patches. The PyTorch framework27 was used to train the network with graphical processing units. The drop-out rates were set to 25% for the first layer and 50% for the second layer to reduce overfitting. The Adam optimizer was used for the optimization of stochastic gradient descent with a batch size of 48.

Per-patient pipeline

In this step, the histogram features calculated from the results of the patch-based prediction were generated for each patient, and the patient-based prediction was performed by concatenating the histogram features and patient information (figure 1E). The 1-year achievement probability per patch from the patch-based classifier was used for the histogram. A histogram confidence score of 20 bins was obtained. Since the number of patches for each patient differed, each histogram was normalized to the sum of 1. Thus, a histogram confidence score 0%–5% showed the proportion of the patches that predicted 0%–5% probability among all patches for a patient. Then, the per-patient image features were combined with the corresponding patient information. The following 21 pieces of information were used: sex, age, Eastern Cooperative Oncology Group performance status score, smoking status, TNM classification, history of autoimmune disease, systemic immunosuppressive therapy status, pre-existing interstitial lung disease, history of malignancy other than SCLC within the last 5 years, history of thoracic radiotherapy, brain metastasis, liver metastasis, bone metastasis, malignant pleural effusion and pericardial effusion, adrenal metastasis, neutrophil count, platelet count, hemoglobin level, serum bilirubin level, serum aspartate transaminase and alanine transaminase levels, and serum creatinine level.28–31 This 43-dimensional feature was defined as the per-patient information. Finally, a patient-based classifier was prepared to predict the confidence score for the 1-year PFS of each patient (figure 1F). The Gradient Boosting Decision Tree algorithm was used as a classifier, while the LightGBM library32 was used to train the classifier. LightGBM was also used in the patient information model, the pathological image model, and the combined model; these were performed in 23, 20, and 43 dimensions, respectively.

Statistical analysis

The primary outcome was the accuracy of machine learning models in predicting the 1-year PFS. The area under the ROC curve was calculated to evaluate the accuracy of the machine learning model. In addition, the area under the ROC curve was calculated to evaluate the predictive accuracy of CD8 count, FoxP3 count, and PD-L1 CPS. The Kaplan-Meier method was used to estimate the PFS, and the groups were compared using the Cox proportional hazard model. Results were expressed as HRs and 95% CIs. An area under the curve (AUC) of ≥0.7 was used to denote a good predictive value, and a two-sided p<0.05 was considered significant.

Results

Patients’ characteristics

We analyzed 78 patients from a prospective cohort of 207 patients with ES-SCLC who received chemoimmunotherapy as first-line treatment. The overall study result was already published elsewhere.19 The patient selection process is shown in online supplemental figure 2, while the baseline clinical characteristics are shown in table 1. The patients’ median age was 78 years. Thirty patients (38%) were aged ≥75 years. A total of 67 patients (86%) had a performance status score of 0 or 1 when first-line therapy was initiated. The median follow-up period was 394 days (IQR: 200–484), and the minimum follow-up period for all patients was 365 days. The overall objective response rate, median PFS, and overall survival (OS) of all patients were 72.4% (55/78), 145 days, and 394 days, respectively. The 1-year PFS rate was 10.3% (8/78).

Table 1

Baseline clinical characteristics of the patients

Accuracy of predictive models for treatment efficacy using machine learning analysis

Among analyzed patients, 8 patients achieved 1-year PFS, whereas 70 patients did not. We obtained 100,544 patches of pathological images from all patients; of these, 9878 patches were form patients that achieved 1-year PFS, and 90,666 patches were from those that did not achieve 1-year PFS. Using a fivefold cross-validation approach, the predictive models showed the following mean AUC values: 0.789 (95% CI 0.671 to 0.922) for the patient information model, 0.782 (95% CI 0.735 to 0.844) for the pathological image model, and 0.868 (95% CI 0.818 to 0.910) for the combined model. The ranges of AUC values across all folds were 0.571–0.982, 0.750–0.911, and 0.786–0.929 in the patient information model, pathological image model, and combined model, respectively. The ROC curves are shown in figure 2. In addition, the predictive models showed the following mean c-indices: 0.731 (95% CI 0.656 to 0.807) for the patient information model, 0.754 (95% CI 0.725 to 0.783) for the pathological image model, and 0.820 (95% CI 0.749 to 0.892) for the combined model. To confirm the accuracy of this analysis, we excluded the three patients that were censored within 1 year and retrained the model a total of 75 patients. The predictive models showed the following mean AUC values: 0.754 (95% CI 0.695 to 0.812) for the patient information model, 0.786 (95% CI 0.747 to 0.824) for the pathological image model, and 0.818 (95% CI 0.745 to 0.855) for the combined model. The ROC curves are shown in online supplemental figure 4.

Figure 2

Performance of the patient-based classifier was evaluated using the area under the receiver operating characteristic (ROC) curve value for 1-year progression-free survival. (A) Patient information model. (B) Pathological image model. (C) Combined model. AUC, area under the curve.

Predictive model stratifying patients treated with immunotherapy by PFS

The patients were classified into high and low efficacy groups using the predictive model to examine its predictive accuracy for PFS. Since five models were created using a fivefold cross-validation approach, the PFS of the high and low efficacy groups was estimated for each fold. The Kaplan-Meier curves for PFS are shown in figure 3 for each median accuracy model and in online supplemental figure 3 for all folds. The PFS was longer in the high efficacy group than that in the low efficacy group when determined using the patient information model (HRs (95% CIs) 0.714 (0.425 to 1.202), 0.676 (0.422 to 1.082), 0.468 (0.287 to 0.762), 0.507 (0.281 to 0.913), and 0.648 (0.400 to 1.049) for each fold, respectively), pathological image model (HRs (95% CIs) 0.089 (0.026 to 0.303), 0.125 (0.044 to 0.357), 0.334 (0.177 to 0.628), 0.411 (0.236 to 0.714), and 0.524 (0.312 to 0.881) for each fold, respectively), and combined model (HRs (95% CIs) 0.152 (0.062 to 0.370), 0.266 (0.141 to 0.500), 0.353 (0.195 to 0.637), 0.411 (0.236 to 0.714), and 0.420 (0.240 to 0.732) for each fold, respectively). Further, the Kaplan-Meier curves for PFS of the model excluding the patients censored within 1 year are shown in online supplemental figure 5.

Figure 3

Kaplan-Meier curve estimates for progression-free survival (PFS) of the high and low efficacy groups as predicted using our predictive models and the median accuracy for each model. (A) Patient information model. (B) Pathological image model. (C) Combined model.

Interpretative importance score from the trained model

The relative importance of a specific variable for determining the 1-year PFS achievement was calculated using each predictive model; the importance of each model is shown in table 2. Since we used LightGBM to train final classifiers, the importance of each element can be calculated by focusing on how many times each element was used in a branch during the learning process.32 The patient information model demonstrated that age was the most important factor, followed by the presence of malignant pleural effusion and pericardial effusion, smoking status, presence of bone metastasis, and presence of liver metastasis. In the predictive model based on the pathological images, the histogram confidence score consisting of 20 bins was calculated based on the results of patch-based prediction for each patient. The histogram confidence score reflects the percentage of patches predicting the probability of the 1-year PFS achievement in patch-based prediction. The pathological image model demonstrated that the histogram confidence score of 0%–5% had the highest importance, followed by histogram confidence scores of 95%–100%, 15%–20%, 5%–10%, and 10%–15%. The combined model demonstrated that a histogram confidence score of 0%–5% had the highest importance, followed by the histogram confidence scores of 5%–10%, 10%–15%, 95%–100%, and 15%–20%. The pathological image features obtained scores with higher importance in the combined model than the scores of the patient information features.

Table 2

Interpretative importance score from the patient-based classifier

Accuracy in predicting the achievement of 1-year PFS based on the manual cell count

To examine the usefulness of machine learning-based evaluation, the predictive value of manually analyzed pathological images was determined. The densities (medians (ranges)) of CD8-positive TILs and FoxP3-positive TILs were 18 (0–668)/mm2 and 32 (0–541)/mm2, respectively. The median (range) of PD-L1 CPS was 0 (0–100). The densities of CD8-positive TILs and FoxP3-positive TILs were comparable between patients who achieved and those who did not achieve the 1-year PFS (CD8-positive TILs: median, 52/mm2 vs 23/mm2, p=0.286; FoxP3-positive TILs: median, 42/mm2 vs 30/mm2, p=0.396). The PD-L1 CPS was also comparable between patients who achieved and those who did not achieve the 1-year PFS (median, 0 vs 0; p=0.325). For the 1-year PFS, the CD8-positive TILs, FoxP3-positive TILs, and PD-L1 CPS demonstrated the AUC values of 0.681, 0.626, and 0.567, respectively. The ROC curves are shown in figure 4.

Figure 4

Performance of the human count data from the pathological images was evaluated using the area under the curve estimates for 1-year progression-free survival. (A) CD8-positive TILs. (B) FoxP3-positive TILs. (C) PD-L1 Combined Positive Score (CPS). AUC, area under the curve; PD-L1, programmed cell death ligand 1; TILs, tumor-infiltrating lymphocytes.

Discussion

Here, we used prospective cohorts and investigated the predictive value of a machine learning model that used pathological images and patient information for determining the efficacy of immunotherapy in patients with ES-SCLC who received chemoimmunotherapy as first-line treatment. The pathological image model was found to have a good performance in determining the efficacy of immunotherapy. In the combined model using patient information and pathological images, the importance of the pathological images in predicting the achievement of 1-year PFS was higher than that of patient information. To the best of our knowledge, this is the first study to show that a classification of the TIME using machine learning could be a biomarker for immunotherapy efficacy in SCLC.

The primary objective of our study was to identify a potential biomarker of immunotherapy efficacy in SCLC, where biomarkers have not been established. Thus far, PD-L1 expression, tumor mutation burden, and microsatellite instability have been approved by the Food and Drug Administration as companion biomarkers of immunotherapy response in other tumor types. The reported AUC range for these companion biomarkers was 0.683–0.755,20–22 which denotes their utility in clinical practice. In our pathological image model, the mean AUC for the treatment efficacy was 0.782, which was comparable to that of the approved companion biomarkers. Furthermore, the AUC with the lowest accuracy among the fivefold was 0.750. This result may indicate that the pathological image model had good generalization performance as each model displayed high accuracy regardless of the randomness of the patient population. Therefore, the prediction using our pathological image model has sufficiently high predictive performance compared with the approved companion biomarkers of other tumor types and can be a biomarker of immunotherapy efficacy in SCLC.

This study showed that the pathological images were more important for predicting the efficacy of immunotherapy than the patient information. In several tumor types, the efficacy of immunotherapy differs depending on patient factors such as age, smoking status, metastasis site, and performance status.28–31 Although these factors were important in the patient information model, “age” was the only patient information-related factor that was among the top 10 most important factors in the combined model, while the rest were pathological image factors. Our findings have shown that the TIME may play a more important role in predicting the antitumor efficacy of immunotherapy than the patient background. Hence, the TIME status should be considered when developing future biomarkers for determining immunotherapy efficacy.

Machine learning-based analysis outperforms manual TIL assessment when predicting the efficacy of immunotherapy, although the underlying reason or mechanism remains unknown. However, the spatial analysis of the relationship between each inflammatory cell and tumor is important for making accurate predictions. In previous studies, the spatial evaluation of TILs based on nuclear segmentation33 and deep convolutional neural networks from pathological images34 were used to predict immunotherapy efficacy. However, these studies were limited by difficulties in performing a spatial analysis that included the tumor area and the role of lymphocytes. In our study, tumor area annotation and lymphocyte IHC were used to evaluate the tumor and lymphocyte localization. Lymphocyte localization (stromal or intratumoral) affected the lymphocyte activity when examining the antitumor immune status.35–37 Thus, a spatial analysis of the relationship between each inflammatory cell and the tumor is important for evaluating the TIME status, which can predict the immunotherapy efficacy.

When evaluating the TIME, machine learning analysis has better objectivity, which is essential for enhancing the reliability of a biomarker in clinical practice. However, a manual spatial analysis performed by a human has a significant interobserver variability38 39 that can negatively impact the findings of studies and clinical trials. Using machine learning, we were able to conduct an objective spatial analysis that humans cannot achieve.4 5 16 40 Based on these perspectives, machine learning will be important for analyzing complex pathological images for use as a biomarker in clinical practice.

Our predictive biomarker was developed using only three pathological slides without the need for any special equipment or techniques. Although RNA analysis has been suggested for classifying the immune status of SCLC patients,41 it requires large amounts of specimens. In general, patients with advanced-stage lung cancer are often diagnosed through a bronchoscopic examination, but many of them are unable to undergo biomarker testing due to the inadequate tissue volume obtained.42 Therefore, testing using a small number of specimens or a small specimen size is critical in clinical practice. Moreover, H&E staining and IHC do not require special equipment and techniques. Thus, our method is relatively simple for identifying reliable biomarkers in routine clinical practice.

Our study has some limitations. First, the amount of available data was limited. This created a ceiling for optimizing the machine learning analysis because the weights and biases are trained. However, a certain level of classification performance acceptable for clinical application can be achieved using only 100 cases,34 43–45 and the number of cases required for model training is declining. Second, although a fivefold cross-validation was conducted, external independent cohort validation was not performed. Therefore, our predictive model may be overfitting and may overestimate the model accuracy. In fact, it was not possible to evaluate all folds with a common threshold, such as one calculated from the Youden index of the average ROC. However, all of the models showed the same tendency in terms of importance that developed final model. Although further validation is needed with more data in the future, we believe that our models were able to extract features and model for key of biomarker. Finally, because our predictive model was developed by deep learning and validated on the results of H&E staining, the PD-L1 expression determined via IHC, and the CD8_FoxP3 IHC pathological images in parallel, the specific contribution of each pathological image to the prediction model and the interpretation of the TIME remains unclear.

In conclusion, a machine learning analysis of TIME using pathological images can predict the efficacy of immunotherapy in SCLC, which has no established biomarker. An objective spatial analysis is critical for evaluating the TIME status and predicting the efficacy of immunotherapy. Evaluating the TIME using machine learning analysis may help determine the efficacy of immunotherapy.

Data availability statement

Data are available on reasonable request. The corresponding author declares that he had full access to all the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. The data supporting the findings of this study are available from the corresponding author on reasonable request. The data are not publicly available due to privacy and ethical restrictions.

Ethics statements

Patient consent for publication

Ethics approval

The Wakayama Medical University institutional review board (No. 2805) approved the protocol. The study protocol and all amendments were approved by the independent institutional review board of each participating institution. The study was conducted in accordance with the principles of the Declaration of Helsinki. All patients provided written informed consent prior to study enrolment.

Acknowledgments

We would like to thank the patients, clinical staff, data manager of Osaka Metropolitan University, and other support staff of Wakayama Medical University.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors RS: conceptualization, data curation, formal analysis, investigation, methodology, project administration, writing–original draft, and writing–review and editing. DF: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, writing–original draft, writing–review and editing, and the guarantor. TN and AS: formal analysis (machine learning analysis), methodology, writing–original draft, and writing–review and editing. YKitamura and JF: formal analysis (pathological analysis), methodology, writing–original draft, and writing–review and editing. YS, TK, HM, TY, SM, AH, MT, YT, JS, NF, HT, NY and YKoh. HA: conceptualization, data curation, investigation, writing–original draft, and writing–review and editing.

  • Funding This work was supported by Chugai Pharmaceutical.

  • Competing interests The authors declare the following financial interests/personal relationships, which may be considered potential competing interests. DF has received personal fees from AstraZeneca KK, Boehringer Ingelheim Japan; Ono Pharmaceutical; Bristol-Myers Squibb; Taiho Pharmaceutical; Chugai Pharmaceutical; MSD KK; Eli Lilly Japan KK; Kyowa Kirin; and Novartis Pharma KK, outside of the submitted work. YKitamura is the chief executive officer of N Lab Co., Ltd and its stockholder. JF is an advisor of N Lab Co., Ltd. YS has received personal fees from AstraZeneca, Chugai Pharmaceutical, MSD, Ono Pharmaceutical, Novartis, Pizer, Taiho Pharmaceutical, Nippon Kayaku, Bristol-Myers Squibb, Eli Lilly, Takeda, and Kyowa Kirin, outside of the submitted work. TK has received personal fees and scholarship donations from Chugai Pharmaceutical, outside of the submitted work. HM has received personal fees from Chugai Pharmaceutical, outside of the submitted work. TY has received personal fees from Chugai Pharmaceutical, outside of the submitted work. SM has received personal fees from Chugai Pharmaceutical, Taiho Pharmaceutical, Ono Pharmaceutical, AstraZeneca, Bristol-Myers Squibb, Eli Lilly, Boehringer-Ingelheim Japan, and Takeda Pharmaceutical, outside of the submitted work. AH has received personal fees from Chugai outside of this study, outside of the submitted work. YT has received personal fees from Chugai outside of this study, outside of the submitted work. HT has received personal fees from Chugai Pharmaceutical, Ono Pharmaceutical, AstraZeneca, Bristol-Myers Squibb, Boehringer-Ingelheim Japan, and Pfizer Japan, outside of the submitted work. NY has received research funds from Chugai Pharmaceutical, outside of the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.