Background We present a computational approach (ArcTIL) for quantitative characterization of the architecture of tumor-infiltrating lymphocytes (TILs) and their interplay with cancer cells from digitized H&E-stained histology whole slide images and evaluate its prognostic role in three different gynecological cancer (GC) types and across three different treatment types (platinum, radiation and immunotherapy).
Methods In this retrospective study, we included 926 patients with GC diagnosed with ovarian cancer (OC), cervical cancer, and endometrial cancer with available digitized diagnostic histology slides and survival outcome information. ArcTIL features quantifying architecture and spatial interplay between immune cells and the rest of nucleated cells (mostly comprised cancer cells) were extracted from the cell cluster graphs of nuclei within the tumor epithelial nests, surrounding stroma and invasive tumor front compartments on H&E-stained slides. A Cox proportional hazards model, incorporating ArcTIL features was fit on the OC training cohort (N=51), yielding an ArcTIL signature. A unique threshold learned from the training set stratified the patients into a low and high-risk group.
Results The seven feature ArcTIL classifier was found to significantly correlate with overall survival in chemotherapy and radiotherapy-treated validation cohorts and progression-free survival in an immunotherapy-treated validation cohort. ArcTIL features relating to increased density of TILs in the epithelium and invasive tumor front were found to be associated with better survival outcomes when compared with those patients with an increased TIL density in the stroma. A statistically significant association was found between the ArcTIL signature and signaling pathways for blood vessel morphogenesis, vasculature development, regulation of cell differentiation, cell-substrate adhesion, biological adhesion, regulation of vasculature development, and angiogenesis.
Conclusions This study reveals that computationally-derived features from the spatial architecture of TILs and tumor cells are prognostic in GCs treated with chemotherapy, radiotherapy, and checkpoint blockade and are closely associated with central biological processes that impact tumor progression. These findings could aid in identifying therapy-refractory patients and further enable personalized treatment decision-making.
- computational biology
- genital neoplasms
- tumor microenvironment
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- computational biology
- genital neoplasms
- tumor microenvironment
After breast cancers, gynecological malignancies are the most common cancer in women, with 114,000 cases occurring annually in the USA, and are the third leading cause of cancer-associated deaths, with 33,600 reported annually.1 Three types of cancers account for the vast majority of the female genital malignancies: cervical cancer (CC), ovarian cancer (OC), and endometrial cancer (EC). Collectively, the 5-year overall survival (OS) rate of gynecological cancer is less than 60%, with epithelial OC having a 5-year OS rate of less than 40%.2 3
The standard of care for advanced-stage metastatic OC and EC consists of either cancer-directed debulking surgery, followed by adjuvant platinum–taxane chemotherapy or neoadjuvant chemotherapy (NACT), followed by interval cytoreductive surgery.4–6 Chemotherapy with weekly cisplatin is widely used concurrently with radiotherapy in locally advanced CC.7 Additionally, immunotherapy with immune checkpoint inhibition (ICI) is an increasingly popular treatment given its durable effect and lower toxicity and is currently under extensive investigation in cancers of the female genital tract.8
Although the majority of ECs are curable by surgery with or without radiotherapy, a small fraction with advanced metastatic or recurrent disease still exhibit poor prognosis, lower response to therapy and poor survival.9 Platinum-based combination chemotherapy is the standard chemotherapy regimen in these settings. In OC, despite initial responsiveness to platinum chemotherapy, more than 50% of patients with advanced metastatic disease experience disease recurrence.10 Patients with OC whose cancers recur within 6 months of platinum-based chemotherapy are considered to have platinum-resistant OC (PROC). The response rate of any further chemotherapy in PROC is less than 10%–15%, and the median OS is less than 12 months.10 The efficacy of other treatments, such as immunotherapy and radiotherapy, varies across patients.11 It is therefore critical to identify high-risk patients who are likely to recur after platinum-based chemotherapy and those with platinum-resistant disease. Further, immunotherapy with ICI is a promising option with durable response and low toxicity. However, only small fraction of patients with gynecological cancers (GC) apparently benefits from it. Consequently there is a need for better biomarkers to identify those patients who are likely to benefit from prolonged therapy, like maintenance therapy with bevacizumab or poly adenosine diphosphate-ribose polymerase inhibitors and/or personalized targeted therapy to lower risk of recurrence and disease progression as also those patients who will receive durable benefit from ICI agents.12
Tumor microenvironment (TME) is the whole environmental components around the tumor mass containing heterogeneous cells from epithelial and stromal compartment.13–15 Different attributes of TME have been reported as prognostic markers in GC.9 16 17 Tissue-based transcriptome data have confirmed that tumor-associated cells, specifically tumor-infiltrating lymphocytes (TILs) density and immune cells in stroma, are prognostic TME-associated markers in various cancer types.18–20 However, due to the inter/intra-observer variability in interpreting TME characteristics,19 there are no clearly-defined standardized methods for assessing spatial interaction of cells within TME, limiting the ability to use them as clinical biomarkers.21 22 Interestingly, some computational methods have investigated the geospatial patterns of TILs, these geospatial TIL patterns from H&E slides characterize the overall immune profile of a tumor have been associated with prognosis in different cancer types including lung cancer and breast cancer but not yet for GC.23–25
This work presents a new computational pathology approach, called ArcTIL, to quantitatively characterize and evaluate spatial patterns relating to the architectural interplay and arrangement of TILs and cancer cells in different tumor tissue compartments, namely (1) epithelial nests, (2) surrounding stroma, and (3) epithelial and stromal compartments at the invasive tumor front. Our approach differs from previous related approaches,23 24 in that we introduce a novel set of geospatial features relating to patterns of immune cell in stroma, epithelium and invasive tumor front. Further, considering the similarities in etiologic factors, rate of coincidence,26 and morphologic patterns of OC, CC, and EC, we evaluated our novel ArcTIL patterns across all three diseases. The ArcTIL features were evaluated on 926 patients from multiple sites and cancer types (OC, CC, and GC) in terms of clinical benefit and survival following adjuvant platinum-based chemotherapy or radiotherapy and ICIs.
Data set description
A total of 926 GC cases (whole slide H&E-stained resected neoplasms) were acquired from The Cancer Genome Atlas (TCGA) (n=877) and Cleveland Clinic Foundation (CCF) (n=49) (figure 1). Five hundred and four patients from the TCGA cohort were treated with chemotherapy. One hundred and three patients with OC who underwent surgery followed by chemotherapy were randomly divided between a training cohort (D0, N=51) and a testing cohort (D1, N=52) and are designated cohorts D0 and D1 (for OC). The other patients from this data set who were treated with chemotherapy after surgery, were divided into cohorts D2 (for CC) and D4 (for EC). Three hundred and seventy-three patients with CC and EC from the TCGA cohort were treated with radiotherapy after surgery and formed cohorts D3 and D5, respectively. There were no patients who received both chemotherapy and radiotherapy. The Cleveland Clinic Foundation cohort included 49 patients treated with immunotherapy agents including pembrolizumab, nivolumab±ipilimumab, and avelumab, all in the recurrent setting (D6 for OC, D7 for CC, and D8 for EC. We pulled all the cases admitted to CCF, with diagnosis of OCs, CCs or ECs who underwent surgical intervention at CCF between 2004 and 2019 treated with ICI. Scanning of CCF sections was performed specifically for this study and in consensus with their treating physician. As for the outcome, patients were assessed with at least two cycles of interval CT imaging. Using the comparison of two imaging assessments, the patients were labeled as responders, if they had evidence of decreased radiologic tumor burden with a partial or complete response by (Response Evaluation Criteria in Solid Tumours) RECIST V.1.1.
For the TCGA cases, resected tumor tissues underwent the following analysis steps including (1) formalin-fixing and paraffin-embedding, (2) staining with H&E and (3) digitizing as whole slide images (WSI) at 40× magnification (0.2527 microns per pixel resolution) using an Aperio ScanScope. The output was saved in the SVS file, which is a digital slide image file and contains a series of TIFF images, as well as its image description tag. The scanner information was collected from the SVS file metadata and staining process details were obtained from the available TCGA documentation (information available at (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga).
For the slides pulled from CCF, the exact same staining procedure was employed. Also, scanning and compression of WSIs were performed under similar conditions and scanner settings. For the CCF patients, only one representative slide image was considered per patient. However, for the TCGA, some patients had multiple tissue slides and thereby multiple SVS files. For those cases with tumor masses spanning multiple WSIs, patches were extracted from all tumor regions across all slides and used for downstream analysis. As the progression-free survival (PFS) was missing for most of the TCGA cases, the endpoint of interest for D0–D8 was OS. For CCF studies, the PFS data were used to evaluate the prognostic ability of the model.
D0 was employed for feature discovery and training the ArcTIL model for prognosticating OS. D1–D8 were used for independently evaluating the prognostic ability of the model. Prior to analysis, WSIs underwent a quality control (QC) process27 to eliminate poor quality images inappropriate for the study due to the presence of artifacts, cracked tissue, or blurriness. No slides were eliminated on account of the QC procedure; however, some regions with imperfect quality in otherwise acceptable slides, including improperly-stained or tissue-devoid regions, were identified by the QC process and excluded. As for the histo-genomic analysis, we gathered all the histo-genomic data available for the TCGA patients, which included a total of 84 patients with OC, 267 patients with CC, and 261 patients with EC.
Tissue compartment segmentation
To segment the tissue compartments, WSIs at 10× magnification were processed by a computational segmentation model that was previously trained on estrogen receptor positive breast cancer tissue image patches.28 This model assigned a value to each pixel, reflecting the likelihood that the pixel is part of the epithelium. This probabilistic epithelial mask was then converted to a binary mask by a likelihood threshold (0.8) and then processed by morphological operations (by removing connected components smaller than 2000 pixels (125 square microns)) (figure 2D). The performance of the method was qualitatively evaluated by a pathologist (P1) to ensure adequate identification of epithelial components. Threshold selection and post-processing steps were chosen to maximize the qualitative agreement between the binary masks and the P1’s assessment. 3000×3000-pixel patches were then categorized by the ratio of epithelial tissue to stromal tissue into three types of compartments: (1) epithelium (ratio of epithelium to stroma >3) (2) stroma (ratio of stroma to epithelium >3), and (3) invasive tumor front (ratio of epithelial region to stroma between 1/3 and 3). A representative image patch from tumor edge with its corresponding binary masks is shown in figure 2B,D. Given that the paraffin-embedded blocks were selected for the largest quantity and the highest quality of tumor, the selected blocks are unlikely to contain significant components of benign epithelium. Thus, we did not design a systematic way to exclude non-neoplastic non-invasive regions of epithelium and made the implicit assumption when dividing sections into epithelial and stromal components that the epithelium contains a high enough proportion of invasive carcinoma such that rare benign epithelial elements would not impact our model.
Nuclear segmentation and TIL classification
First, all the nuclei were automatically segmented using the method of Veta et al specifically tuned for nuclei segmentation of H&E images.29 This method applies the radial symmetric transform to find the controlled markers and then uses the watershed method at different scales. Then, the method of Corredor et al30 was used to classify each segmented nucleus as either a TIL or non-TIL. This classifier is a support vector machine with a linear kernel that utilizes image-derived features of each segmented nucleus related to texture, shape, and color. Finally, four types of nuclei were defined depending on the compartment they were in: (1) epithelial TILs, (2) epithelial non-TILs, (3) stromal TILs (S-TILs), and (4) stromal non-TIL (figure 2E).
Quantitative immune profile of TILs
Spatial graph construction
The spatial arrangement and density of TILs were characterized by using the method introduced by Corredor et al.30 First, cell proximity clusters of each nuclei class were constructed using specified vertices and edges,31 where the vertices were built on individual nuclei within a cell class. For constructing the edges, a connection rule based on Euclidean distance was defined between two given vertices. This rule favors connectivity between proximal cell nuclei and defies connectivity between distal cell nuclei, yielding colonies of proximal nuclei that are closer than 20 microns. This threshold was tuned to match the distance needed to form the immune cell cytotoxic synapse which induces immunologic memory and triggers the death domains.32 33 Four classes of proximity graphs are depicted in one representative patch (figure 2F).
For each patch within the individual tissue compartments, features relating to the geospatial architecture and interplay of TILs and cancer nuclei were extracted. The features described either the spatial arrangement of each cell class alone (eg, the area of epithelial TIL foci), or the relationship between at least two classes of cell types (eg, interplay between TILs and cancer nuclei, the ratio of cluster nuclear density between neighboring TIL and non-TIL clusters in stroma, or the variance between the number of connections (edges) of all four cell groups, or the ratio between the number of TILs and the total number of nuclei). To convert patch-level features to patient-level features, six statistics (mean, median, minimum, maximum, range, and variance) were calculated on the total number of epithelial patches, stromal patches, and tumor edge patches in the WSIs available for a given patient yielding 21,096 features per patient.
OS was measured from the date of diagnosis to the date of death and was censored at the date of last follow-up for survivors. PFS was measured from the date of diagnosis to the date of progression of the disease or the date of death, whichever occurred earlier, and was censored for those alive without disease progression at the date of last follow-up. D0 was employed for feature discovery and training a regression model for prognosticating OS. D1–D8 were used for independently validating the prognostic ability of ArcTIL. Using the ‘glmnet’ R package V.3.6.3, the whole array of features were fed to the least shrinkage and selection operator (LASSO) method to select the most prognostic ArcTIL features for inclusion in a Cox regression model.34 LASSO imposes a penalty constraint on the sum of the absolute values of the model regression contributions, causing coefficients of most of the variables to iteratively shrink down to zero. All features were first standardized to have a mean of zero and the SD of one relative to the training cohort, so that HR and feature weights would be comparable across features. The model was fitted on D0 to yield the multivariable ArcTIL signature. Using a 10-fold cross validation scheme and corresponding to the lowest partial likelihood deviance, the penalty term λ was tuned. The variables corresponding to the non-zero coefficients were included in the Cox regression model, in turn constituting the ArcTIL risk score (linear combination of the features and their corresponding coefficients). The ArcTIL risk score for each patient reflects an estimated risk for OS or PFS. A threshold on training patients’ ArcTIL risk scores was chosen to maximize the HR between the two risk strata in D0. Also, additional risk stratification experiments were performed to categorize an individual patient into one of three risk levels. Risk thresholds to distinguish the three levels (ultra-low, intermediate and ultra-high) were obtained by dividing the risk scores in D0 into tertiles. ArcTIL was subsequently validated on D1–D5 for prognosticating OS and D6–D8 for prognosticating PFS. Kaplan-Meier survival analysis with the log-rank test was used to examine the differences of time-to-event data (PFS/OS) between patient groups categorized by the ArcTIL risk classifier. The model performance was summarized by relative HRs, their 95% CIs using the Wald test, the log-rank rank test, and Harrell’s concordance indices (C-indices) on cohort D1–D8 (using survfit and coxph functions in R). C-index is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs of validation data. P values were two-sided, and all values ≤0.05 were considered statistically significant.
Clinicopathologic factors of cohorts in the study
Clinical and pathological data characteristics of all patients including histology, treatment strategy, and demographics are summarized in table 1.
Prognostic value of the ArcTIL signature
The Cox regression model trained on D0 comprised seven ArcTIL features that were prognostic of OS. Two features were from the epithelial compartment, two from the stroma, and three from the tumor edge. Details of the selected ArcTIL features and their HRs are listed in table 2. The binary ArcTIL signature was prognostic of OS in all chemotherapy and radiotherapy-treated patients (chemotherapy OC (D1, p=0.04, HR=1.96, 95% CI=(1.02 to 3.76), C-index=0.69), chemotherapy CC (D2, p=4e−5, HR=6.70, 95% CI=(2.70 to 16.60), C-index=0.88), radiotherapy CC (D3, p=0.005, HR=3.08, 95% CI=(1.40 to 6.79), C-index=0.78), chemotherapy EC (D4, p=0.002, HR=3.31, 95% CI=(1.57 to 6.96), C-index=0.79), radiotherapy EC (D5, p=2e−7, HR=6.99, 95% CI=(3.35 to 14.60)), C-index=0.85, figure 3). Also, in the ICI-treated cohorts (D6–D8), ArcTIL identified high-risk patients had significantly worse PFS than low-risk patients p=0.009, HR=2.98, 95% CI=(1.31 to 6.78), C-index=0.62 (figure 3). As illustrated in online supplemental figure 1, the p value suggests statistically significant separation of survival curves in all validation cohorts, except D1. The separation of three risk levels was statistically significant in D2–D8. Also, as can be appreciated from online supplemental figure 1, the ultra-high versus ultra-low risk cases were statistically significantly different in terms of OS/PFS in all validation cohorts (D1–D8). In the immunotherapy-treated cohort, referring to D6–D8 (panel E), the survival curves of intermediate and ultra-high seems overlapped in most of the time with HR=1.26, 95% CI=(0.51 to 3.13), suggesting that ultra-low risk cases are clearly separable against the rest of the cohort. The patterns of cell arrangements were consistently different between long and short OS/PFS patients across all three cancer types and all three treatment choices, see figure 4 and online supplemental figure 2.
Relative to ArcTIL-defined high-risk patients, low-risk patients had more TILs in the epithelium and invasive tumor front (feature 2 and 5), fewer TILs in the stroma (feature 6) and less mixing of stromal cells and TILs (feature 3). In ArcTIL-defined high-risk patients, epithelial TILs were scarce (features 1 and 2), non-TIL stromal cells that are located within 20 microns of another non-TIL stromal cell cluster were elevated (feature 3), and S-TIL to stromal non-TIL population decreased (feature 4). Both preceding features (feature 3 and 4) resemble immune-desert characteristics in ArcTIL-defined high-risk strata. Additionally, in ArcTIL-identified high-risk cases, the presence of large swaths of tumor cells uninterrupted by TILs produced a large variation in the size of non-TIL cell clusters (feature 7), unlike in low-risk cases where more uniform TIL dispersion produced more homogenous non-TIL cluster sizes (feature 7). Three prognostic features relating to dispersion of TILs throughout the tumor (indices 2, 3, 5) are illustrated in figure 4.
Our results suggest that for all three GCs and regardless of the choice of therapy, a consistent pattern of geospatial immune profile were found to be prognostic of survival and progression of disease. This quantitative pattern, referred to as ArcTIL, comprised the area of S-TIL clusters at the tumor invasive front, and SD of the area of epithelial tumor cells. Both these features were observed to be elevated in patients who had shorter OS, or PFS time. Conversely, in patients with GC with longer survival times, the S-TILs were relatively scarce while epithelial TILs were abundant. In a previous study,35 this feature characterizes an attribute referred to as motility of S-TILs of TME in the literature, which was quantified by elongation of specific TIL populations, and has been shown to be associated with prolonged PFS. Also, clusters of cancer cells in low-risk patients were smaller as they were more frequently interrupted by TILs while high-risk patients had fewer fragmented cancer cell clusters. In summary, low-risk patients had abundant lymphocytes in close proximity to cancer cells in the epithelium, while in high-risk patients the lymphocytes were confined to the stroma.
The prognostic ability of this set of features in GC is in line with multiple other studies correlating spatial immune signature with cancer outcome23–25 36 and the differential morphologic pattern between good and poor prognostic outcomes was observed irrespective of therapy, histology, and cancer site. Prognostic value of ArcTIL even outperformed the stage variable, which is the current standard for gynecologic oncology.37 These findings also highlight a need for future strategies to enhance infiltration of TILs to the epithelium and counteract the immune exclusion likely induced by stroma.
Prior studies13 14 have reported a positive association of TIL architecture and OS, metastasis, and progression, specifically in patients with high-grade serous OC (HGSOC). Additionally, the prognostic ability of immune cell subtypes, particularly T and B cells, has been explored in clinical outcome and treatment response assessment using immunohistochemistry or immunofluorescence.36 38 These studies are all in line with the prognostic role of TILs that we established in this study. A prognostic benefit for patients with high intratumoral CD8 +TIL in HGSOC undergoing NACT was reported.39 Increased cytolytic T cell gene signatures have been reported in patients who responded well to NACT.40 While we did not explicitly explore the individual immune cell populations in this study, our findings indicate that the spatial location of immune cells (without distinguishing them as B or T cells) and their relation to cancer cells identified in the H&E slides were prognostic of outcome in GC. Furthermore, though the ArcTIL features that were identified as prognostic of survival were initially discovered from a high mortality cohort (OC, with short OS), the relevant features were also found to be prognostic in other cohorts with low mortality (EC and CC).
In the context of EC, prior studies have reported that treatment failure is mainly triggered by the interactions between the primary tumor mass and the surrounding stroma.9 In HGSOC, prior studies have reported that disease progression and metastasis is positively correlated with immune architecture within the TME.13–15 For CC, the majority of tumors are initiated by human papillomavirus infection, so the immune system defects play a significant role in cancer progression.16 Also, gene alterations related to T cell activation are shown to impact the CC survival rates.17 While we did not explicitly explore the specific TIL subpopulations that were implicated in response (since the study focused solely on H&E images), these previous studies appear to validate the prognostic ability of the spatial immune signature developed in our study.
Recently, a number of deep learning approaches have been presented for prognosticating outcome from WSI without the need to mine specific image characteristics.41 While in general, deep learning (DL) models can be employed to automatically interrogate large data sets without the need for domain-specific knowledge of the problem, the opacity of resulting features is often an impediment to interpreting model outcome.42 Conversely, an approach that relies on hand-crafted, engineered features does not need a large training cohort. A hand-crafted approach seeks to identify specific predefined patterns in the images, hence, it is typically not entirely dependent on the quantity within the learning sets.
As previously stated, since our machine learning approach involves a predefined engineered feature, increasing the sample set size has minimal improvement in model accuracy beyond a certain sample set size. Therefore, increasing the training set size does not result in a significant improvement in model accuracy or performance. Also, given the aforementioned issues with respect to model opacity as it concerns DL approaches, especially in the context of high stakes decisions like choice of treatment, a hand-crafted based approach might be more amenable for clinical adoption and guiding therapeutic management.43
The prognostic role of spatial architecture of TILs has been explored in other solid tumors including but not limited to lung, breast, rectum and their clinical relevance has been investigated via similar but not identical approaches to ArcTIL.23 24 Corredor et al24 have trained an automatic scheme for TIL quantification and reported a prognostic computational signature on non-small cell lung cancer tumors, but their scheme does not differentiate between geographical regions of the tumor. Saltz et al23 showed that a convolutional neural network identified spatial TIL maps as correlated with prognosis across tumor subtypes, and molecular immune profiles and survival in 11 cancer types, including cervical and endometrial. Our study is novel in that the cell arrangements are extracted from geographically different regions of tissue (stroma, epithelial nests and invasive tumor front), additionally this is the first study we are aware of evaluating the association with clinically meaningful endpoints in the context of different types of GC and across different treatment modalities.
The spatial hand-crafted TIL features presented in this study may explain the previously controversial results44–55 about the prognostic role of TIL density. Since T cells must form a direct cytotoxic immunological synapse with tumor cells to be able to target cell death receptors and induce immunologic memory against tumor antigens,33 TILs which are not adjacent to tumor cells may have a more limited effect on tumor growth. This might therefore explain why a more sophisticated biomarker based on localization of TILs, rather than solely density, could better predict patient outcomes in GC.33 This might also explain why the TIL density assessment of a subset of slides by three pathologists was not found to be prognostic of outcome. Further, the robustness of the ArcTIL model was confirmed by uniform manifold approximation and projection analysis showing no difference in the ArcTIL signature between different cancer types, treatment regimens, and populations (see online supplemental figure 4).
Currently, state of the art molecular approaches that relate to prognostic biomarkers in GC include a few immuno-oncology biomarkers, for example, programmed death-ligand 1 and tumor mutational burden (TMB).56 However, due to their poor predictive value and the fact that the majority of GC tend to have low TMB, these markers are not clinically relevant.57 In addition, we identified biological associations between ArcTIL and genomic pathways related to cancer progression, specifically cell–cell adhesion, and pathways that are implicated in triggering a cytotoxic synapse of TILs.32 33 Some of these pathways are involved in regulating immune response and carcinogenesis, and they could trigger the underlying mechanism responsible for ArcTIL-defined risk phenotypes.
We acknowledge that our study did have some limitations. The use of only retrospective cohorts somehow restrains a decisive finding. The ICI-treated patients particularly were from a single site and contained very few patients from each cancer type; clearly these findings will have to be validated in larger multi-institutional cohorts. Additionally, the patients with OC in this study had a median age 61 which could confound OS. A future study may use disease-specific survival, which was not available for these patients, to reduce the influence of this covariate. Finally, our data revealed the prognostic utility of our risk model. While we have not yet explicitly investigated the role of our model as a predictive biomarker, in our immunotherapy cohort of patients with GC, there was evidence that supported the potential role of this model to predict benefit from immunotherapy with ICI.
In spite of the aforementioned limitations, our data demonstrated that ArcTIL reliably identified high risk patients with GC likely to have recurrence and short-term survival after treatment with chemotherapy, radiotherapy, or immunotherapy. The ArcTIL approach is a completely tissue non-destructive digital assay that could potentially offer prognostic and predictive information for multiple different treatment regimens at a significantly lower cost compared with extant molecular assays, given that it only involves computational image analysis of a digitized H&E slide. Future work will involve validating ArcTIL in its ability to identify patients who will benefit from more intensive follow-up or therapy intensification.
Supplementary materials and methods
Training size sensitivity and disease type sensitivity analysis of the model performance
To ensure that the results are consistently irrespective of training set size and histology, we formulated four different experiments; for each, we trained a machine-learning model based off entirely different training cohorts:
E1: training cohort includes only a random split of OC (training size=51).
E2: training cohort includes only a random split of CC (training size=134).
E3: training cohort includes only a random split of OC (training size=252).
E4: training cohort includes all three types of cancers; a random split of all TCGA (training size=292, including (34 OC, 90 CC, and 168 EC)).
E1 is the primary model described in the paper and one which was trained on only one specific cancer type (ovarian). In these supplementary experiments, we performed additional experiments (E2, E3, and E4) and compared the C-index values across all validation sets.
Experiments E2 and E3 are disease-specific (trained on cervical and endometrial cases, respectively), and E4 comprises a combination of all GCs which were considered for learning and discovery. Therefore, in each experiment, both the population of the learning set and the histology of the diseased organ was different.
Inter-reviewer variability of human TIL grading
As a comparative strategy, we assessed the prognostic ability of TIL counts determined by human experts against the prognostic ability of ArcTIL model. Three pathologists (P1, P2, and P3) blinded to outcome assessed each WSI of D6–D8 and scored an immune infiltration profile by counting TILs in the same three 475 by 228 micron fields of view (FOV) selected by P2 from tumor content. Experience among the pathologists varied; P1 practiced as a general pathologist for 1 year, P3 as a general pathologist for >20 years, and the P2 as a subspecialty gynecologic pathologist for 3 years. To split the patients based on each pathologist’s opinion into high-TIL versus low-TIL count, the average TIL count over the three FOV was categorized into three quantiles, with the highest one considered high-TIL-infiltration and the other two considered low-TIL cases. This strategy was based on approaches previously described in.19 58 The prognostic ability of the model based on pathologist quantification was analyzed by survival analysis on D6–D8 and was compared with ArcTIL. The agreement among pathologists on the TIL-grading task was measured via the one-way intraclass correlation coefficient (ICC), which is widely used to evaluate agreement among a set of experts making non-categorical judgments.59
Investigating the ArcTIL features and biological pathways
We investigated the relationship between the ArcTIL features and the corresponding genotype in OC, CC, and EC separately. For OC, CC, EC TCGA cohort (D0–D1, D2–D3, and D4–D5), messenger RNA sequencing gene expression data using robust multichip average were available for N=84, 267, and 261 patients, respectively. The gene expression data (Level 3-Affymetrix HT HG U133A) were obtained from the Broad Institute public repository.60 An empirical analysis using the Wilcoxon rank-sum test (WRST) of the 12,042 annotated genes across the high-risk and low-risk patients, corrected by the Benjamini and Hochberg method61 for controlling false discovery rate, yielded a set of differentially expressed genes (DEGs). Utilizing these DEGs, Gene Ontology analysis was performed to elucidate underlying biological processes,62 63 which structures and classifies genes based on the known molecular and cellular biological processes and provides the relationship between those processes. These pathways were chosen on the basis of their biological significance in regulating immune response, cell adhesion, and carcinogenesis. Furthermore, single sample gene set enrichment analysis (ssGSEA) was employed to determine how often members of a set of genes in a pathway reflected differences between two risk strata.64 The lists of genes involved in each pathway were obtained from the Molecular Signatures Database.65 Then, using the gene expression levels and pathways of interest, we obtained pathway enrichment scores for each patient. Ultimately, a pairwise WRST on enrichment scores was performed across high-ArcTIL and low-ArcTIL feature groups to capture the strength of association between the pathway enrichment score and the feature values.
Evaluating resilience of ArcTIL features
Uniform manifold approximation and projection (UMAP) embedding66 was performed separately on each of D1–D8 to assess that the ArcTIL signature worked resiliently on all three diseases and each of the therapy regimens. This also assessed the innate variation between different populations and source site of images, which reflects if scanner parameters, staining, and other batch effects can potentially alter this imaging signature. UMAP reduced the feature space from 21,096 to two-dimensions (2D). If distinct clusters emerged in the 2D space, and those clusters corresponded to a specific population, disease, treatment, or institute, this would suggest those attributes affected the ArcTIL model. Whereas, a resilient imaging signature that consisted of a robust set of features would produce a homogeneous distribution of labels in the 2D space.
Investigating prognostic ability of clinicopathology features
By fitting the multivariable Cox proportional hazards model with the combination of ArcTIL and age, race, stage, we investigated the independent prognostic ability of ArcTIL for OS and PFS after accounting for these clinical factors. The univariate analyses were further performed using each one of the aforementioned factors at a time. The independent discriminative value of ArcTIL on OS and PFS was measured comparing the two models.
Training size sensitivity and disease type sensitivity analysis of the model performance
Additional experiments performed (E2, E3, and E4) included more patient cases compared with E1 for learning and discovery. As may be appreciated from online supplemental table 1, while the model validation performance metrics (including C-indices, HR, p) did indeed change for each experiment, these changes were minimal. Also, most importantly, in each instance, the combination of selected features included at least one feature from each compartment (namely stroma (S), epithelium (E), and tumor invasive front (T)). In summary, varying the number of patients in our training set does not result in significant changes in model performance.
Comparing the prognostic ability of ArcTIL versus human
ICC between counts from each pair of readers (P1–P2, P2–P3, and P1–P3) were, respectively, 0.0339, 0.1884, and 0.6955. TIL counts provided by each pathologist were used to split D6–D8 into high-TIL and low-TIL count groups. No pathologist-based model was prognostic of PFS (HR_P1=1.34, 95% CI=(0.71 to 2.51), p=0.4; HR_P2=1.26, 95% CI=(0.67 to 2.36), p=0.5; and HR_P3=1.29, 95% CI=(0.69 to 2.41), p=0.4) (online supplemental figure 3). In contrast, the ArcTIL signature was prognostic (HR=2.98, 95% CI=(1.31 to 6.78), p=0.009). Raw average TIL counts at online supplemental table 2.
Histo-genomic analysis of ArcTIL features and biological pathways
The analysis of the 12,042 annotated genes across the chemotherapy patients with OC (n=84, subset of D0 and D1) resulted in 368 DEGs between the two ArcTIL risk strata (the list of all DEGs is presented in online supplemental table 3. Gene Ontology analysis structured these DEGs and identified 22 central biological pathways that were significantly correlated with ArcTIL risk score (online supplemental table 4). Additional results incorporating the data for OC, CC, and EC describe the biological process, cellular components, molecular functions, and molecular pathways sorted according to their enrichment scores (online supplemental figure 4). For the OC cohort, gene set annotations for the 22 biological processes that were extracted by Gene Ontology analysis were used to calculate ssGSEA scores for each of the seven prognostic ArcTIL features (online supplemental figure 5). The correlation between each ArcTIL feature and biological pathway is shown in online supplemental figure 5 (and online supplemental table 6) and suggests that gene sets corresponding to blood vessel morphogenesis, vasculature development, regulation of cell differentiation, cell-substrate adhesion, biological adhesion, regulation of vasculature development, and angiogenesis were strongly associated with at least one of the epithelium specific ArcTIL features.
Analyzing batch effect
UMAP (online supplemental figure 6). However, the patients did not cluster based on the organ of malignancy, therapy regimen, race, or tissue source sites, suggesting that the ArcTIL signature was resilient to these factors. Despite the variations in ArcTIL values and clinical and pathological features, univariate and multivariable analyses illustrated a consistently prognostic value for OS and PFS (online supplemental tables 7,8). It can be inferred that the prognostic value of none of the clinical variables (even the stage variable, which is the current standard for gynecologic oncology37) is consistently prognostic in every validation cohort. Also, in online supplemental table 7, we investigated the association of each ArcTIL feature with regard to the clinical endpoints (OS in D1–D5 and PFS in D6–D8) in all cohorts, using univariate models. Even though the individual features are not always significantly associated with clinical outcome, the combined ArcTIL score is.
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information.
Patient consent for publication
The cohorts were from publicly available TCGA. Participants gave informed consent to participate in the study before taking part.
Correction notice This article has been corrected since it was first published. Haider Mahdi has been added as a corresponding author.
Contributors All authors confirm that they had full access to all the data in the study and accept responsibility to submit for publication. SA and AM act as the guarantors of the study.
Funding Research reported in this publication was supported by the National Cancer Institute under award numbers 1U24CA199374-01, R01CA249992-01A1, R01CA202752-01A1, R01CA208236-01A1, R01CA216579-01A1, R01CA220581-01A1, R01CA257612-01A1, 1U01CA239055-01, 1U01CA248226-01, 1U54CA254566-01, National Heart, Lung and Blood Institute 1R01HL15127701A1, R01HL15807101A1, National Institute of Biomedical Imaging and Bioengineering 1R43EB028736-01, National Center for Research Resources under award number 1 C06 RR12463-01, National Science Foundation GRFP: CON501692, VA Merit Review Award IBX004121A from the US Department of Veterans Affairs Biomedical Laboratory Research and Development Service, the Office of the Assistant Secretary of Defense for Health Affairs, through the Breast Cancer Research Program (W81XWH-19-1-0668), the Prostate Cancer Research Program (W81XWH-15-1-0558, W81XWH-20-1-0851), the Lung Cancer Research Program (W81XWH-18-1-0440, W81XWH-20-1-0595), the Peer Reviewed Cancer Research Program (W81XWH-18-1-0404, W81XWH-21-1-0345), the Kidney Precision Medicine Project (KPMP) Glue Grant, the Ohio Third Frontier Technology Validation Fund, the Clinical and Translational Science Collaborative of Cleveland (UL1TR0002548) from the National Center for Advancing Translational Sciences (NCATS) component of the National Institutes of Health and NIH roadmap for Medical Research, the Wallace H Coulter Foundation Program in the Department of Biomedical Engineering at Case Western Reserve University. Sponsored research agreements from Bristol Myers Squibb, Boehringer Ingelheim, and AstraZeneca. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the US Department of Veterans Affairs, the Department of Defense, or the US Government. All authors confirm that they had full access to all the data in the study and accept responsibility to submit for publication.
Competing interests AM is an equity holder in Elucid Bioimaging and in Inspirata. In addition, he has served as a scientific advisory board member for Inspirata, AstraZeneca, Bristol Meyers Squibb and Merck. Currently he serves on the advisory board of Aiforia and currently consults for Caris, Roche and Aiforia. He also has sponsored research agreements with Philips, AstraZeneca, Boehringer Ingelheim and Bristol Meyers Squibb. His technology has been licensed to Elucid Bioimaging. He is also involved in a NIH U24 grant with PathCore, and three different R01 grants with Inspirata. Other authors declare no potential conflicts of interest.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.