Article Text
Abstract
Background Immune checkpoint inhibitors (ICIs) are a pillar of cancer therapy with demonstrated efficacy in a variety of malignancies. However, they are associated with immune-related adverse events (irAEs) that affect many organ systems with varying severity, inhibiting patient quality of life and in some cases the ability to continue immunotherapy. Research into irAEs is nascent, and identifying patients with adverse events poses a critical challenge for future research efforts and patient care. This study's objective was to develop an electronic health record (EHR)-based model to identify and characterize patients with ICI-associated arthritis (checkpoint arthritis).
Methods Forty-two patients with checkpoint arthritis were chart abstracted from a cohort of all patients who received checkpoint therapy for cancer (n=2,612) in a single-center retrospective study. All EHR clinical codes (N=32,198) were extracted including International Classification of Diseases (ICD)-9 and ICD-10, Logical Observation Identifiers Names and Codes (LOINC), RxNorm, and Current Procedural Terminology (CPT). Logistic regression, random forest, gradient boosting, support vector machine, K-nearest neighbors, and neural network machine learning models were trained to identify checkpoint arthritis patients using these clinical codes. Models were evaluated using receiver operating characteristic area under the curve (ROC-AUC), and the most important variables were determined from the logistic regression model. Models were retrained on smaller fractions of the important variables to determine the minimum variable set necessary to achieve accurate identification of checkpoint arthritis.
Results Logistic regression and random forest were the highest performing models on the full variable set of 32,198 clinical codes (AUCs: 0.911, 0.894, respectively) (table 1). Retraining the models on smaller fractions of the most important variables demonstrated peak performance using the top 31 clinical codes, or 0.1% of the total variables (figure 1). The most important features included presence of ESR, CRP, rheumatoid factor lab, prednisone, joint pain, creatine kinase lab, thyroid labs, and immunization, all positively associated with checkpoint arthritis (figure 2).
Conclusions Our study demonstrates that a data-driven, EHR based approach can robustly identify checkpoint arthritis patients. The high performance of the models using only the 0.1% most important variables suggests that only a small number of clinical attributes are needed to identify these patients. The variables most important for identifying checkpoint arthritis included several unexpected clinical features, such as thyroid labs and immunization, indicating potential underlying irAE associations that warrant further exploration. Finally, the flexibility of this approach and its demonstrated effectiveness could be applied to identify and characterize other irAEs.
Ethics Approval This study was approved by the Northwestern University Institutional Review Board, ID STU00210502, with a granted waiver of consent
AUC was calculated from the ROC curve. Sensitivity, specificity, PPV, and NPV were determined at the threshold maximizing the F1-score. AUC = area under the curve, ROC = receiver operating characteristic, PPV = positive predictive value, NPV = negative predictive value