Article Text
Abstract
Background Immune checkpoint inhibitors (ICIs) have transformed cancer therapy, yet their response rates remain modest, ranging from 20–40% across different cancer types.1 There exists a critical need for predictive tools to optimize treatments, avoid unnecessary side effects and identify patients most likely to respond to ICIs. Towards this goal, we developed a novel machine learning model for predicting overall survival (OS) in cancer patients undergoing treatment with ICIs, called ImmunoBERT, which takes as input clinical and molecular data currently available in real-world settings.
Methods We curated a comprehensive clinicogenomics dataset of cancer patients treated with anti-PD1 and anti-CTLA4 checkpoint therapies (n=1700 patients), ICI drug structure embeddings and binding affinity profiles of ICI drug targets. Using this dataset, we trained ImmunoBERT (figure 1) which leverages large language models (LLMs)2 and ProteinBERT (a deep learning model built upon the classic Transformer/BERT architecture)3 to learn a generalization between ICI drugs, their protein targets, clinically available genomics data and patient outcome. Correlations and higher-order interactions between 220 genes commonly sequenced on commercial NGS panels were also leveraged using ImmunoBERT architecture, to reconstruct features that improved ICI survival response prediction accuracy, including n= 32 tumor microenvironment (TME) features, tumor mutational burden (TMB) and PDL1 expression.
Results ImmunoBERT performance was benchmarked against top performing machine learning models from the Anti-PD1 Response Prediction DREAM Challenge.4 The C-index (or concordance index) statistic was used to evaluate and compare the predictive accuracy of ImmunoBERT and the different survival models, where higher C-index indicates better predictive accuracy of the model. ImmunoBERT (C-index= 0.69) outperformed the top DREAM challenge models (C-index of 0.6348 for top submission) in predicting patient response to ICI therapies.
Conclusions Our study demonstrates the value of integrating biologically relevant factors, such as drug structure, target binding affinity and genomic information, into machine learning models to improve accuracy of ICI response predictions. Moreover, by effectively leveraging real-world clinicogenomics data, we were able to reconstruct additional biologically relevant features, including TME characteristics which further improved both the performance and interpretability of ImmunoBERT over current models. ImmunoBERT offers improved ICI prognostic capabilities, facilitating personalized treatment decisions to these promising drugs and enhancing patient care.
Acknowledgements The authors would like to acknowledge members of the Zephyr AI science, engineering and data teams
References
Sharma P, Hu-Lieskovan S, Wargo JA, Ribas A. Primary, Adaptive, and Acquired Resistance to Cancer Immunotherapy. Cell. 2017 Feb 9;168(4):707–23.
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020 Feb 15;36(4):1234–40.
Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 2022 Apr 12;38(8):2102–10.
Vincent BG, Szustakowski JD, Doshi P, Mason M, Guinney J, Carbone DP. Pursuing Better Biomarkers for Immunotherapy Response in Cancer Through a Crowdsourced Data Challenge. JCO Precis Oncol. 2021 Nov;5:51–4.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.