Article Text

Download PDFPDF

1294 External validation of machine learning models to predict efficacy and toxicity of immune checkpoint inhibitors using real-world pan cancer cohorts
  1. Zoltan Kiss1,
  2. Levente Lippenszky1,
  3. Balazs Laczi1,
  4. Pablo Napan-Molina2,
  5. Eszter Csernai1,
  6. Alexander Brehmer3,
  7. Moon Kim3,
  8. Julius Keyl3,4,
  9. Jens Siveke5,6,
  10. Moritz Meyer7,
  11. Viktor Grunwald8,
  12. Stefan Kasper5,9,
  13. Alexander Roesch5,10,11,
  14. Martin Schuler5,9,
  15. Travis Osterman12,
  16. Jan Wolber13 and
  17. Jens Kleesiek3,5,14,15
  1. 1Science and Technology Organization – Artificial Intelligence and Machine Learning, GE HealthCare, Budapest, Budapest, Hungary
  2. 2Science and Technology Organization – Artificial Intelligence and Machine Learning, GE HealthCare, San Ramon, CA, USA
  3. 3Institute for AI in Medicine (IKIM), University Medicine Essen, Essen, Nordrhein-Westfalen, Germany
  4. 4Institute of Pathology, University Hospital Essen (AöR), University of Duisburg-Essen, Essen, Nordrhein-Westfalen, Germany
  5. 5German Cancer Consortium (DKTK), Partner Site Essen, Essen, Nordrhein-Westfalen, Germany
  6. 6Bridge Institute of Experimental Tumor Therapy (BIT), West German Cancer Center, University Hospital Essen (AöR), Essen, Nordrhein-Westfalen, Germany
  7. 7Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Essen, University of Duisburg-Essen, Essen, Nordrhein-Westfalen, Germany
  8. 8Clinic for Medical Oncology, Clinic for Urology, West German Cancer Center, University Hospital Essen (AöR), Essen, Nordrhein-Westfalen, Germany
  9. 9Department of Medical Oncology, West German Cancer Center, University Hospital Essen (AöR), University of Duisburg-Essen, Essen, Nordrhein-Westfalen, Germany
  10. 10Center for Medical Biotechnology (ZMB), University of Duisburg-Essen, Essen, Nordrhein-Westfalen, Germany
  11. 11Department of Dermatology, University Hospital Essen, West German Cancer Center, University Duisburg-Essen, Essen, Nordrhein-Westfalen, Germany
  12. 12Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
  13. 13Pharmaceutical Diagnostics, GE HealthCare, Chalfont St Giles, Buckinghamshire, UK
  14. 14- Department of Physics, Technical University Dortmund, Dortmund, Nordrhein-Westfalen, Germany
  15. 15Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen, Essen, Nordrhein-Westfalen, Germany
  • Journal for ImmunoTherapy of Cancer (JITC) preprint. The copyright holder for this preprint are the authors/funders, who have granted JITC permission to display the preprint. All rights reserved. No reuse allowed without permission.


Background Predicting a patient‘s response to immune checkpoint inhibitors (ICIs) could help understand the benefit-risk profile of treatment, potentially improve clinical trial cohort selection, and may inform care pathway decisions in clinical practice. Recently, machine learning (ML)-based predictive analytics have gained momentum in this area, but models trained or evaluated on multi-center data are still rare. Therefore, it is difficult to assess the generalizability of single-center models.

We present the results of an external validation of a ML framework trained on US data on a large European cohort.

Methods Random forest classification models were built to predict overall survival (OS) greater than 100 days, one year, and three years, and to predict the occurrence of hepatitis within six weeks, 90 days, and one year after initiation of ICI treatment. For model training, we utilized routinely available real-world data from Vanderbilt University Medical Center (data cut-off December 31, 2018) of a more than 2,200 patient strong pan-cancer cohort containing patients with localized as well as metastatic tumors. Structured, routine clinical data such as age, laboratory values, medication history and condition codes were used as model features. Feature engineering involved aggregating laboratory measurements acquired over a 120-day time window, and a one-year window was applied for other data types. The hepatitis binary label was defined as 1 if any liver enzymes exceeded three times the upper limit of normal (table 1).

The trained models were evaluated in an external retrospective pan-cancer cohort of the University Hospital Essen, Germany (n=4257). All input variables were extracted from a FHIR database using FHIRPACK.1 Containerized models were employed for data integration and model evaluation.

Results Our random forest models achieved an AUC of up to 0.79 for the prediction of OS and up to 0.81 for the prediction of hepatitis in the training data. The models successfully retained at least 90% of their performance for OS and 86% for hepatitis prediction endpoints on the external evaluation cohort (tables 2 and 3).

Conclusions To our knowledge, this is the first large-scale, external cohort evaluations of OS and hepatitis prediction ML models in ICI patients. Despite different geographic origins, our models generalized well to unseen data. In particular, short-term models showed remarkable performance retention when applied to an external cohort. Our work demonstrates the potential of ML models as valuable tools for pre-screening eligible patients for ICI clinical trials and as clinical decision support for routine patient management.



Ethics Approval Ethics approval for using the Vanderbilt university cohort was granted by The Vanderbilt University Medical Center Health Sciences #3 Institutional Review Board, tracked as #211814. The IRB determined the study poses minimal risk to participants, and a waiver of consent was granted.

The study using the external validation cohort was approved by the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (No. 21–10347-BO). The requirement for written informed consent was waived due to the retrospective design of the study.

Abstract 1294 Table 1

Detailed information of features and labels used in the evaluated models

Abstract 1294 Table 2

Detailed results of the external evaluation for OS models

Abstract 1294 Table 3

Detailed results of the external evaluation for hepatitis models

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.