Article Text

## Abstract

**Background** Immune effector cell-associated neurotoxicity syndrome (ICANS) is a clinical and neuropsychiatric syndrome that can occur days to weeks following administration chimeric antigen receptor (CAR) T-cell therapy. Manifestations of ICANS range from encephalopathy and aphasia to cerebral edema and death. Because the onset and time course of ICANS is currently unpredictable, prolonged hospitalization for close monitoring following CAR T-cell infusion is a frequent standard of care.

**Methods** This study was conducted at Brigham and Women’s Hospital from April 2015 to February 2020. A cohort of 199 hospitalized patients treated with CAR T-cell therapy was used to develop a combined hidden Markov model and lasso-penalized logistic regression model to forecast the course of ICANS. Model development was done using leave-one-patient-out cross validation.

**Results** Among the 199 patients included in the analysis 133 were male (66.8%), and the mean (SD) age was 59.5 (11.8) years. 97 patients (48.7%) developed ICANS, of which 59 (29.6%) experienced severe grades 3–4 ICANS. Median time of ICANS onset was day 9. Selected clinical predictors included maximum daily temperature, C reactive protein, IL-6, and procalcitonin. The model correctly predicted which patients developed ICANS and severe ICANS, respectively, with area under the curve of 96.7% and 93.2% when predicting 5 days ahead, and area under the curve of 93.2% and 80.6% when predicting the entire future risk trajectory looking forward from day 5. Forecasting performance was also evaluated over time horizons ranging from 1 to 7 days, using metrics of forecast bias, mean absolute deviation, and weighted average percentage error.

**Conclusion** The forecasting model accurately predicts risk of ICANS following CAR T-cell infusion and the time course ICANS follows once it has begun.Cite Now

- Antigens
- Costimulatory and Inhibitory T-Cell Receptors
- Immunity, Cellular
- Receptors, Chimeric Antigen

## Data availability statement

Data are available in a public, open access repository.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.

## Statistics from Altmetric.com

#### WHAT IS ALREADY KNOWN ON THIS TOPIC

A prior multivariable logistic regression model predicted the occurrence of immune effector cell-associated neurotoxicity syndrome (ICANS) on days 3–5 with lower accuracy than our model.

#### WHAT THIS STUDY ADDS

Data from 199 patients receiving chimeric antigen receptor-T cell therapy were used to develop a forecasting model. The model provides well calibrated probabilities of ICANS onset and accurately predicts ICANS time course 1–7 days ahead.

#### HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

With further external validation, our forecasting model may be useful for triaging and resource allocation and may allow many patients to be safely discharged from the hospital earlier that is currently possible.

## Introduction

Chimeric antigen receptor (CAR) T-cell therapy (CAR T-cell) has transformed the treatment of hematological malignancies, with primary approvals of CD19-directed or BCMA-directed CAR T-cell therapy for relapsed/refractory large B-cell lymphoma, B cell acute lymphoblastic leukemia, follicular lymphoma, mantle cell lymphoma, and multiple myeloma. However, CAR T-cell therapy carries risk for complications. These include cytokine release syndrome (CRS) and immune effector cell-associated neurotoxicity syndrome (ICANS), which remain significant causes of CAR T-cell related morbidity and mortality.1 Among trials of the five CD19-targeted CAR T-cell products, 20%–70% of patients experienced grade 1 or higher ICANS which may present as encephalopathy, aphasia, focal weakness, numbness, apraxia, seizures, or in rare cases, cerebral edema, and death.2–9 Previous studies have shown that ICANS is only partially responsive to available treatments, and patients often require prolonged hospitalization and supportive care.10–13 Steroids remain the mainstay of treatment for ICANS, which may affect the efficacy of CAR T-cell cell therapy14 and carry their own profile of adverse effects.

Prediction of which patients will develop ICANS, when, and what course ICANS will follow, remains challenging. The incidence and onset timing of ICANS is variable among cell products and malignancies, although it usually occurs 3–10 days after cell infusion. Accurate prediction of which patients will develop ICANS and the onset time and subsequent time course have the potential to not only affect management with steroids, but also to reduce unnecessary medical expenses and service, since those who do not develop ICANS may be hospitalized for an unnecessarily long duration.9 Prior research has uncovered risk factors associated with ICANS, including pre-existing neurological and medical disease, high malignancy burden, elevated intensity of lymphodepletion, high peak CAR T-cell expansion, and early and severe CRS.2 13 Several inflammatory biomarkers have been associated with increased occurrence and severity of ICANS, including C reactive protein (CRP), IL-6, and ferritin, but have limited specificity given their co-occurrence with CRS.2 9 11–13 15–18 Multivariable models have been developed to predict the risk of ICANS using both clinical factors and biomarkers,9 19 but do not attempt to predict onset timing or time course. In the present study, we addressed this gap by developing a statistical model to predict the risk and day of onset of ICANS, and to forecast ICANS scores up to 7 days into the future.

## Methods

### Study setting, participants, and clinical data

We conducted a retrospective observational cohort study of patients undergoing CD-19 or BCMA targeted CAR T-cell therapy from April 2015 to February 2020 at Brigham and Women’s Hospital (Boston, Massachusetts, USA). We used two existing cohorts, the first a group of patients at high risk for ICANS (patients who had undergone EEG monitoring from 2016 to 2020 because of concern for ICANS following CAR T-cell infusion), and the second a control group of patients who received CAR T-cell therapy at BWH from 2015 to 2019 that did not develop ICANS.

For all patients who developed ICANS during hospitalization, the ICANS grade was determined for each day of hospitalization through chart review by two independent graders. Each grader used oncology, neurology, and nursing notes to determine the Immune Effector Cell-Associated Encephalopathy score (0–10), which was combined with scores in four other neurological domains to determine the final ICANS grade (0–4, with 4 most severe) in accordance with American Society for Transplantation and Cellular Therapy (ASTCT) consensus grading.20 All patients had ICANS=0 at the time of hospital discharge, and we assumed ICANS remained absent after discharge. Demographic information, oncological diagnosis, vital signs (max temperature per day) and laboratory values, where available (CRP, ferritin, white blood cell (WBC), IL-6, procalcitonin), were collected via review of the electronic medical record. CRS incidence, day of onset, and peak grade were identified via chart review and by calculating CRS grade based on its constituent factors pulled from the electronic medical record (presence and degree of hypotension, temperature >38°C, need for supplementary oxygen and/or positive pressure ventilation) according to ASTCT consensus grading guidelines.20

### Forecasting model development

#### Leave-one-patient-out cross validation

We developed a Hidden Markov model (HMM) to forecast ICANS scores (figure 1). An HMM is a probabilistic model that has an internal state that cannot be observed directly, but is instead inferred through some probabilistic function of the observed data. To avoid overly optimistic (biased) estimates of model performance, we used a leave-one-patient-out cross validation approach for model training and evaluation. During model training, each of the 199 patients was ‘held out’, and an HMM was developed only using data from the remaining 198 patients. HMM training included estimating state transition and conditional observation probability matrices. During testing, the HMM model was tested on the held-out patient, and performance statistics were calculated by comparing the forecast results with the observed ICANS time course for that patient.

### HMM structure and parameter estimation

To construct our HMM model we first defined four hidden substates based on observable outputs (ICANS scores), as follows:

pre-ICANS (ICANS=0).

rising phase: ICANS>0 has begun and tends to rise until reaching a peak.

falling phase: ICANS>0, but after the peak value.

post-ICANS (ICANS=0 following the falling phase).

In terms of these substates and the most recent available ICANS observation, we defined the hidden state of the patient at time t as , where and stands for the ICANS score at time t. can take M=20 different values (figure 1B), , and we denote the probability of transition from state to state as . Observations (ICANS scores) and hidden states are related through the conditional probability where are the five levels of ICANS scores.

With these definitions, the state transition matrix A and conditional observation probability matrix B dimensions are 20 × 20 and 20 × 5, respectively. Values of these matrices were estimated nonparametrically from the training data,21 22 with the exception of the transition marking the initial onset of ICANS, which corresponds to the transition from to (see figure 1B). This transition was modeled parametrically, as described in the next section.

### Modeling ICANS onset

Prior studies suggest that the probability of developing ICANS, the time course, and severity of ICANS depend on baseline characteristics like age and clinical data like laboratory values or vital signs. Therefore, in developing the HMM model we modeled the initial transition from ICANS=0 to ICANS>0 (→ ) as a function of time using a pooled Logistic Regression (PLR) approach. Ten predictors (input features) were included in the model: functions of time since CAR T-cell infusion including t, , and , age, maximum daily body temperature, daily serum CRP, ferritin, IL-6, procalcitonin, and WBC. These clinical and laboratory features have been previously shown to consistently correlate with risk of neurotoxicity,9 10 12 13 and are routinely measured in most patients who receive CAR T-cell treatment. We defined the PLR model to have the following form:

(1)

where λ is a non-negative tuning parameter that controls the sparsity of the model (ie, the number of coefficients with a value of zero) and is selected by 10-fold internal cross validation. In this equation, is the transposed vector of model coefficients, is a constant, and are the predictor variables described above. is a function of time that models the average profile of ICANS across the population. To model this function, we first calculated the empirical probability of ICANS onset vs time in the training data () and then fit a function to it:

(2.a)

(2.b)

where are free parameters and is the median time of ICANS onset in our cohort. The free parameters were estimated using the Levenberg-Marquardt algorithm.23 According to the function , figure 2A, the risk of ICANS tends to increase over the time interval then decreases to zero at .

## HMM forecasting

To forecast ICANS scores, we compute by applying Bayes rule sequentially using the HMM forward algorithm.21 24 The forward algorithm consists of two alternating steps: one-step ahead prediction and updating. The one-step ahead prediction density is:

(3)

Then, in the update step we update the density based on data observed at time t:

(4)

where in the last line we have used the fact that is independent of given , and the normalization constant , is given by:

(5)

Using equations (3) and (4), we can write one complete iteration of the predict-update cycle as:

(6)

To predict future states given past observations, we next compute , where is the prediction horizon. We perform this calculation by applying the transition matrix to the current distribution over hidden states as follows:

(7)

Finally, we use the quantity to make predictions about future ICANS scores using:

(8)

### Evaluation forecast accuracy

To evaluate the quality of forecasts, we split ICANS scores of all patients into training and testing sets, using all but one patient’s data for model training. We then run the forward algorithm (equations (3–8)) on data from the one patient left out to forecast ICANS scores h days ahead, where h is the forecast horizon. We considered values of h ranging from 1 to 7.

We quantified the quality of forecasts using the following standard metrics25: forecast bias, mean average deviation (MAD), and weighted average percentage error (WAPE):

(9.a)

(9.b)

(9.c)

In equation (9.a), forecast bias is the difference between forecast () and observed ICANS scores. If the forecast overestimates, the forecast bias is positive; for underestimates the forecast bias is negative. MAD measures the magnitude, on average, of forecasting errors. WAPE, also referred to as the MAD/mean ratio, weights the error by the total true ICANS scores. Similar to MAD, a good forecasting method has smaller WAPE.25

### Evaluation of binary predictions of ICANS/severe ICANS

In addition to the probabilistic performance metrics described above, we also evaluated the performance of the model to make binary predictions about which patients will develop ICANS/severe ICANS. We did this in two ways.

(1) Predicting from day 5: Given data gathered on the first 5 days following CAR T-cell treatment, calculate , for . We make a single binary prediction for each patient by comparing the forecasts for these 23 days with a threshold θ, and predicting ICANS will occur ( =1) if for any of those days, and otherwise predicting that ICANS will not occur (. We predict severe ICANS in a similar way but using .

2) Predicting 5 days ahead: Beginning on day 0, calculate the probability that ICANS >0 in 5 days, and continue this on days t=1, 2, …, 23. That is, calculate , which produces 19 probabilities (for days 5, 6, …, 23) for each patient. We make a single binary prediction for each patient by comparing the forecasts for these 19 days with a threshold θ, and predicting ICANS will occur ( =1) if on any of those days, and otherwise predicting that ICANS will not occur (. We predicted severe ICANS in a similar way but using .

For both methods (1) and (2), for predicting neurotoxicity (ICANS>0) or severe neurotoxicity (ICANS>2), each choice of the threshold value θ results in a binary prediction for each patient, which allows us to calculate an overall sensitivity and false positive rate for the algorithm. We generate receiver operator characteristic (ROC) curves by varying the threshold value and observing how the sensitivity and false positives rates vary.

## Results

Our cohort included 199 patients who underwent CAR T-cell therapy. Among these, 97 (48.7%) developed ICANS. The median (IQR) day of ICANS onset was 9 (1, 20). The duration of neurotoxicity ranged from 2 to 65 days (mean (SD), 6.6 (10.9) days). Online supplemental table S1 summarizes the patient characteristics and online supplemental file 1 shows the time courses of ICANS scores for all patients.

### Supplemental material

### Forecasting onset probability

We first investigated the model’s ability to predict ICANS onset. Factors identified as conferring increased baseline risk were older age, higher body temperature, higher serum level of CRP, IL-6, and procalcitonin. These features were retained in the lasso-penalized logistic regression model. Figure 2B shows the model coefficients along with the 95% confidence intervals corresponding to the seven clinical baseline predictors that were included in the PLR model. As shown in figure 2B, the coefficients of ferritin and WBC are close to zero (0.1), and therefore, they are not strong predictors in our cohort. In figure 3, we show four examples of individual patients who underwent CAR T-cell infusion, including the observed ICANS score (orange) and the predicted probability of ICANS onset (blue) for each day of hospitalization. For two patients who developed ICANS (figure 3A,C), the ICANS onset probability initially increased for 9 and 7 days after cell infusion and then decreased. In contrast, for patients that did not develop ICANS (figure 3B,D), the probability of ICANS onset remained low (less than 0.2) throughout the window in which ICANS typically first develops.

### Forecasting days with ICANS and severe ICANS

We next explored the model’s ability to predict which days a patient would have any neurotoxicity (ICANS>0), or severe ICANS (ICANS>2) for predicting from day 5. Probabilities of each of these events as a function of time are shown in online supplemental figure S1. In online supplemental figure S2A, the first 53 patients have low probability of developing ICANS and online supplemental figure S2B shows the probability of developing severe ICANS. The model correctly predicts (mean absolute error of zero) the day of onset in 70.5% (n=134) and 62.2% (n=109) of cases respectively for ICANS and severe ICANS. In 9.6% and 5.7% the error is 1 day; the error is >1 day in 19.8% and 32% of cases respectively for ICANS and severe ICANS (online supplemental file 1). As expected, the predicting of severe ICANS is more challenging and has more error.

In addition, the ROC curves of the binary predictions of ICANS/severe ICANS are displayed in online supplemental figure S2E,F. The model correctly predicts which patients develop ICANS/severe ICANS with an area under the curve (AUC) of 96.7% and 93.2% using 5 days ahead prediction and with an AUC of 93.2% and 80.6% using predicting from day 5. Online supplemental file 1 also shows the model behavior when forecasting future probability after days of observations of developing ICANS or severe ICANS .

### Forecasting ICANS scores one to seven days ahead

We next evaluated the model’s ability to forecast ICANS scores between 1 and 7 days ahead. Results for two patients—one who developed ICANS, and another who did not—are illustrated in figure 4. As expected, predicted probabilities deviate more from the observed ICANS course as the forecasting horizon is extended, particularly beyond 3 days. We also note that, as we ask the model to forecast further into the future, there is corresponding shift in the forecasted onset time of risk for neurotoxicity. This is expected behavior. In figure 4A, the shift is predominantly due the onset of ICANS on day 8. When forecasting 1 day ahead, this change on day 8 is reflected in the day 9 forecast: the model predicts continuation of risk. In 7 days ahead forecasts, this same day 8 onset cannot be reflected until day 15. In figure 4B, when predicting 1 day ahead, we also observe an increased risk that peaks on day 10, due to the population average risk trajectory, but also modulated by the evolving laboratory values and vital signs. This peak also shifts to approximately day 16 when we increase the forecasting horizon by 7 days, because this is when this external information first enters the forecasting lookback window.

Forecasting performance results are shown in figure 5. MAD and WAPE, respectively, ranged from 0.38 (0.16) to 0.66 (0.25) and from 0.28 (0.07) to 0.46 (0.10) for 1 to 7 day(s) ahead forecasting. All metrics showed reduced performance when forecasting 4 or more days ahead.

To further explore how the model behaves, we calculated forecasted ICANS trajectories after varying numbers (1, 3, 5, 7) of days of observations following CAR T-cell infusion. Results are shown in online supplemental figure S4 for the patient in figure 4. As expected, when the observation period does not include any days with ICANS>0, then the probability of later developing ICANS is predicted to be relatively small for all future dates (online supplemental figure S4A-C). However, when the observation period includes nonzero ICANS observations (online supplemental file 1), the model forecasts a future trajectory lasting ~3 weeks with elevated risk of continuing to experience ICANS.

## Discussion

In this work, we developed statistical models to forecast ICANS scores, in order to predict not only day of onset and course of ICANS, but also which patients will develop ICANS and severe ICANS. Early and accurate detection of ICANS is of great clinical interest, because not all patients develop ICANS, but those who do require prolonged hospitalization and supportive care.9 As a result, statistical models such as these could aid clinical diagnosis and decision making.

We demonstrate high accuracy when predicting the day of onset of ICANS, which is an important clinical time point for management decisions such as steroid administration.12 The examples shown of patients who developed and did not develop ICANS demonstrates the ability of the model to modulate the predicted onset probability using clinical and laboratory covariates from individuals with different baseline levels of risk.

Knowledge of which patients will develop any degree of ICANS and severe ICANS also has significant clinical implications, particularly affecting length of stay and level of inpatient care (floor vs Intensive Care Unit (ICU)12). Through the HMM+PLR method, we improve upon prior studies predicting which patients develop ICANS of any grade or severe ICANS based on data in the first few days following CAR T-cell infusion. We correctly predict which patients developed ICANS and severe ICANS from day 5 with area under receiver operator characteristic curve 96.7% and 93.2% respectively. By comparison, a prior multivariable logistic regression model correctly predicted the occurrence of any ICANS from day 5 with an AUC of 74% and accuracy of 77%9. Another logistic regression model incorporating the modified endothelial activation and stress index score (m-EASIX) had an AUC of 73% in predicting severe ICANS during hospitalization based on data from day 3 following CAR T-cell infusion26.

Our HMM+PLR model is the first to demonstrate forecasting of daily ICANS grades. The model shows excellent performance when predicting 1–3 days ahead, as reflected by MAD and WAPE metrics from 0.38 (0.16) to 0.5 (0.18) and 0.28 (0.07) to 0.38 (0.08). Performance decreases for predictions from day 4 to day 7, with greater divergence from observed values, as expected when forecasting over longer time periods. Nonetheless, the window of 1–3 days is still clinically meaningful to aid decision making, and the forecasting errors even beyond day 3 are relatively small and within clinically useful limits. According to the bias metric, the model tends to underestimate ICANS risk slightly, which may have arisen because the majority of patients do not typically develop ICANS.

The features in the model of onset probability were informed by prior studies showing associations between ICANS and clinical values, but our lasso regularization approach highlights the importance of a subset of these features in predicting ICANS occurrence. As in a prior study by Rubin *et al* involving many of the same patients included in our cohort, older age, fever, and serum CRP were notable predictors of ICANS. However, in contrast, our model also retained serum IL-6 and procalcitonin, an inflammatory cytokine and an acute phase reactant. Other predictors in Rubin *et al*, such as histological subtype, ferritin, minimum WBC, CRS severity, CRS onset day, and number of doses of tocilizumab, were not retained in our model, which could reflect missing values or indicate less importance of these features. Regardless, fewer features could ease implementation of the model, with both IL-6 and procalcitonin often standard more recently among tertiary care centers caring for these patients.

Our study has several limitations. First, ICANS grades were determined via chart review, and thus may contain noise. Second, our dataset has a lot of missing data, especially for procalcitonin and IL-6. We might achieve the higher forecasting accuracy with less missing data. Second, our dataset includes patients from 2015 to 2020, a time period in which management of these patients has evolved, reflecting greater understanding and familiarity with CAR T-cell therapy. Newer treatments that may have greater efficacy in treating ICANS, such as anakinra, are also under clinical trial. Second, the model does not directly account for treatments or clinical developments that may influence the time course of ICANS (eg, development of cerebral edema or status epilepticus; or use of anesthetic drugs). Accounting for these aspects of ICANS might further improve model forecasts but would require fitting a more complex model and would thus require a larger dataset. The model also arises from patients at a single center, so may lack generalizability to other institutions. There also are differences in the rates of ICANS with different CAR T-cell products and malignancy diagnoses, which our model does not account for due to limitations in sample size. A larger dataset could allow for sub-analyses within malignancy subtypes and CAR T-cell products. In addition, our cohort included patients receiving two different CAR T-cell therapies, CD19 and BCMA, which may have somewhat different ICANS time courses. In future studies with sufficiently large cohorts taking each therapy, forecasting might be further improved by developing separate models for each therapy and by accounting for the effects on the course of ICANS of these additional events and interventions.

Future studies could improve performance through additional risk indicators that capture the dynamic nature of ICANS, such as EEG. Patients who develop ICANS exhibit marked EEG changes, such as increased delta and theta activity, generalized periodic discharge, and seizures, and accounting for these changes and their time course might further enhance forecasting. Whether there are EEG changes that precede ICANS or features of the EEG that might indicate increased risk for ICANS, is unknown, and warrants investigation.

## Conclusions

The developed forecasting model in this study in addition to forecast the ICANS scores up to 7 days into the future, predicted which patients are likely to experience ICANS/severe ICANS and the time course ICANS/severe ICANS is likely to follow once it has begun. With further external validation, this model may be useful for triaging and resource allocation and may allow a large proportion of patients to be safely discharged from the hospital early.

## Data availability statement

Data are available in a public, open access repository.

## Ethics statements

### Patient consent for publication

### Ethics approval

This study involves human participants and was approved by The study was conducted under a protocol approved by the Mass General Brigham (MGB) Institutional Review Board using a waiver of written informed consent with the IRB protocol number 2013P001024. Participants gave informed consent to participate in the study before taking part.

## References

## Supplementary materials

## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

Contributors CAE, YA, MBW designed the study. CAE, SAQ, PM, MSF, HHD and DBR collected data. CAJ provided patient and data resources. YA, CAE, MBW analyzed data, interpreted data, did the literature search, wrote the manuscript and created the ﬁgures. All authors reviewed and revised the manuscript.

Funding MBW received funding from the Glenn Foundation for Medical Research and American Federation for Aging Research (Breakthroughs in Gerontology Grant); American Academy of Sleep Medicine (AASM Foundation Strategic Research Award); NIH (R01NS102190, R01NS102574, R01NS107291, RF1AG064312, R01AG062989); and NSF (award SCH-2014431).

Competing interests MBW and SSC are co-founders of Beacon Biosignals, which played no role in this work. DBR served on the scientific advisory board for Celgene/Bristol Meyers Squib. JD received research support from Novartis, consulting fees from Amgen, Blue Earth Diagnostics, and Syndax and royalties from contributing to UpToDate. CAJ received consulting fees from Kite/Gilead, Novartis, BMS/Celgene, Bluebird Bio, Epizyme, Instill-Bio, Lonza, Ipsen, Abintus-Bio, Daiichi-Sankyo and research funding from Kite/Gilead and Pfizer. Other authors report no disclosures relevant to the manuscript.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.