Article Text
Abstract
Background Multi-modal alignment techniques, such as CLIP (Contrastive Language-Image Pretraining), have transformed the way in which modern machine learning algorithms can learn richer and nuanced shared representations between complex datasets.2 Recently, CLIP methods have been extended to cancer biology, enabling the integration of complex biological datasets, such as histopathology and gene expression data. This enables researchers to develop a more detailed and comprehensive understanding of the interactions and relationships in multi-modal biological data. It allows for the identification of complex patterns that can improve diagnosis, treatment, and our overall understanding of tumor heterogeneity. Methods such as BLEEP (Bi-modaL Embedding for Expression Prediction) and TANGLE (Transcriptomics-guided Slide Representation Learning) have demonstrated the feasibility of employing contrastive alignment techniques to learn the shared representations between histopathology images and gene expression profiles for both spatial resolved and bulk datasets.1 3 Furthermore, these approaches also exhibit the opportunity to reconstruct gene expression profiles based on the learned shared representation from an image alone. Our research explores these concepts by investigating and refining these methodologies to enhance the reliability and interpretability of these learned representations, aiming to refine predictive models that bridge the gap between image-based diagnostics and molecular-level insights in cancer biology.
Methods We employed state-of-the-art contrastive alignment techniques inspired by CLIP, BLEEP, and TANGLE to develop a machine learning algorithm for predicting EGFR mutations in Non-Small Cell Lung Cancer (NSCLC) from H&E-stained histopathology images. The dataset is comprised of 111 H&E images from the TCGA-LUAD (The Cancer Genome Atlas – Lung Adenocarcinoma) dataset, divided between 68 mutation negative images and 43 EGFR positive images.4
Results Preliminary results reveal that our models successfully learn to align embeddings across modalities. The alignment of these modalities is crucial as it can facilitate downstream prediction of gene mutations and gene expression reconstructions from histopathology images. Additional experiments are needed to comprehensively evaluate the robustness and interpretability of these aligned embeddings across diverse patient cohorts and genetic mutations.
Conclusions Gene expression guided representation learning has improved the capability of computer vision models to abstract robust features from H&E images alone. Additionally, these ongoing efforts aim to improve our understanding of NSCLC mutations from histopathology images, and it’s expected to play a crucial role in a screening tool for clinical decision making. The ability to reconstruct gene expression profiles from the shared representation of images alone will allow researchers to extract multi-modal features from a single modality.
References
Xie R, Pang K, Chung SW, Perciani CT, MacParland SA, Wang B, et al. Spatially resolved gene expression prediction from H&E histology images via bi-modal contrastive learning. arXiv:2306.01859 [Preprint]. 2023 [cited 2024 Jun 26]. Available from: https://arxiv.org/abs/2306.01859
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. Learning transferable visual models from natural language supervision. arXiv:2103.00020 [Preprint]. 2021 [cited 2024 Jun 26]. Available from: https://arxiv.org/abs/2103.00020
Jaume G, Oldenburg L, Vaidya A, Chen RJ, Williamson DFK, Peeters T, et al. Transcriptomics-guided slide representation learning in computational pathology. arXiv:2405.11618 [Preprint]. 2024 [cited 2024 Jun 26]. Available from: https://arxiv.org/abs/2405.11618
The Cancer Genome Atlas Program (TCGA). National Cancer Institute (NCI).
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.