Article Text
Abstract
Background Examination of histopathology slides is a crucial step in making cancer diagnoses and treatment decisions. Rapid developments in machine learning models in digital pathology have enabled quantitative high-resolution information to be extracted from whole-slide images. Meanwhile, genomic tests and molecular assays have also become powerful in assisting pathologists and oncologists in decision making, but these tests are not routinely performed to consistently provide molecular information. In this study, we developed tissue and cell classification models using Hematoxylin and Eosin-stained (H&E) slides, extracted human-interpretable features (HIFs) quantifying the tumor microenvironment, and investigated the association between abundance and distribution of tumor infiltrating lymphocytes (TILs) and molecular phenotypes.
Methods We trained convolutional neural network-based tissue and cell classification models using H&E slides with annotations collected from US board-certified pathologists, resulting in PathExplore models specific for eight indications, including breast cancer, colorectal cancer, gastric cancer, melanoma, non-small cell lung cancer, pancreatic cancer, prostate cancer, and renal cell carcinoma. We deployed the models on the corresponding indications in TCGA data and quantified HIFs for over 5,000 slides across 13 cancer types. We then analyzed the TIL-associated HIFs with publicly available gene expression and immune signature data.
Results TIL-associated HIFs, such as the frequency of TILs within cancer tissue (cTIL frequency), were correlated with gene expression of known lymphocyte markers, such as CD8A (median Spearman ρ = 0.539 for individual indications), CD3G (ρ = 0.536), and CD2 (ρ = 0.536). Regularized regression models using a panel of TIL-associated HIFs accurately predicted median-binarized expression of these three genes (median AUROC 0.751–0.755 for individual indications, pan-cancer AUROC 0.775–0.782) with best performance in melanoma (AUROC 0.831–0.850). We found good correlations between cTIL frequency with immune signature scores derived from gene expression, including a published lymphocyte infiltration signature score1 (ρ = 0.504) and T-cell signature score2 (ρ = 0.409). In particular, classification models using TIL-associated HIFs can predict the inflammatory subtype (C3 subtype in,1 median AUROC = 0.691, pan-cancer AUROC = 0.765 ± 0.008, 5-fold cross-validation) and the immune-enriched non-fibrotic subtype (IE subtype in,2 median AUROC = 0.755, pan-cancer AUROC = 0.737 ± 0.017).
Conclusions Histopathology image-based quantification of TILs is consistently associated with immune phenotypes derived from molecular measurements. These results suggest that quantitative HIFs extracted from tissue and cell classification models provide rich information for understanding of inflammation in the tumor microenvironment and potential discovery of immune biomarkers.
Acknowledgements The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
References
Thorsson A, et al. The Immune Landscape of Cancer. Immunity. 2018;48–4:812–830
Bagaev A, et al. Conserved pan-cancer microenvironment subtypes predict response to immunotherapy. Cancer Cell. 2021;39:845–865
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.