Background Predicting a patient‘s response to immune checkpoint inhibitors (ICIs) could help understand the benefit-risk profile of treatment, potentially improve clinical trial cohort selection, and may inform care pathway decisions in clinical practice. Recently, machine learning (ML)-based predictive analytics have gained momentum in this area, but models trained or evaluated on multi-center data are still rare. Therefore, it is difficult to assess the generalizability of single-center models.
We present the results of an external validation of a ML framework trained on US data on a large European cohort.
Methods Random forest classification models were built to predict overall survival (OS) greater than 100 days, one year, and three years, and to predict the occurrence of hepatitis within six weeks, 90 days, and one year after initiation of ICI treatment. For model training, we utilized routinely available real-world data from Vanderbilt University Medical Center (data cut-off December 31, 2018) of a more than 2,200 patient strong pan-cancer cohort containing patients with localized as well as metastatic tumors. Structured, routine clinical data such as age, laboratory values, medication history and condition codes were used as model features. Feature engineering involved aggregating laboratory measurements acquired over a 120-day time window, and a one-year window was applied for other data types. The hepatitis binary label was defined as 1 if any liver enzymes exceeded three times the upper limit of normal (table 1).
The trained models were evaluated in an external retrospective pan-cancer cohort of the University Hospital Essen, Germany (n=4257). All input variables were extracted from a FHIR database using FHIRPACK.1 Containerized models were employed for data integration and model evaluation.
Results Our random forest models achieved an AUC of up to 0.79 for the prediction of OS and up to 0.81 for the prediction of hepatitis in the training data. The models successfully retained at least 90% of their performance for OS and 86% for hepatitis prediction endpoints on the external evaluation cohort (tables 2 and 3).
Conclusions To our knowledge, this is the first large-scale, external cohort evaluations of OS and hepatitis prediction ML models in ICI patients. Despite different geographic origins, our models generalized well to unseen data. In particular, short-term models showed remarkable performance retention when applied to an external cohort. Our work demonstrates the potential of ML models as valuable tools for pre-screening eligible patients for ICI clinical trials and as clinical decision support for routine patient management.
Ethics Approval Ethics approval for using the Vanderbilt university cohort was granted by The Vanderbilt University Medical Center Health Sciences #3 Institutional Review Board, tracked as #211814. The IRB determined the study poses minimal risk to participants, and a waiver of consent was granted.
The study using the external validation cohort was approved by the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (No. 21–10347-BO). The requirement for written informed consent was waived due to the retrospective design of the study.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.