Article Text

Download PDFPDF

Original research
A Machine learning model trained on dual-energy CT radiomics significantly improves immunotherapy response prediction for patients with stage IV melanoma
  1. Andreas Stefan Brendlin1,
  2. Felix Peisen1,
  3. Haidara Almansour1,
  4. Saif Afat1,
  5. Thomas Eigentler2,3,
  6. Teresa Amaral2,
  7. Sebastian Faby4,
  8. Adria Font Calvarons4,
  9. Konstantin Nikolaou1,5 and
  10. Ahmed E Othman1,6
  1. 1Department of Diagnostic and Interventional Radiology, Universitätsklinikum Tübingen, Tubingen, Germany
  2. 2Center of Dermatooncology, Department of Dermatology, Eberhard Karls Universitat Tubingen, Tubingen, Germany
  3. 3Department of Dermatology, Venereology and Allergology, Charite Universitatsmedizin Berlin, Berlin, Germany
  4. 4Computed Tomography, Siemens Healthcare GmbH, Erlangen, Germany
  5. 5Image-guided and Functionally Instructed Tumor Therapies (iFIT), The Cluster of Excellence 2180, Tuebingen, Germany
  6. 6Institute of Neuroradiology, Johannes Gutenberg University Hospital Mainz, Mainz, Germany
  1. Correspondence to Professor Ahmed E Othman; ahmed.e.othman{at}googlemail.com

Abstract

Background To assess the additive value of dual-energy CT (DECT) over single-energy CT (SECT) to radiomics-based response prediction in patients with metastatic melanoma preceding immunotherapy.

Material and methods A total of 140 consecutive patients with melanoma (58 female, 63±16 years) for whom baseline DECT tumor load assessment revealed stage IV and who were subsequently treated with immunotherapy were included. Best response was determined using the clinical reports (81 responders: 27 complete response, 45 partial response, 9 stable disease). Individual lesion response was classified manually analogous to RECIST 1.1 through 1291 follow-up examinations on a total of 776 lesions (6.7±7.2 per patient). The patients were sorted chronologically into a study and a validation cohort (each n=70). The baseline DECT was examined using specialized tumor segmentation prototype software, and radiomic features were analyzed for response predictors. Significant features were selected using univariate statistics with Bonferroni correction and multiple logistic regression. The area under the receiver operating characteristic curve of the best subset was computed (AUROC). For each combination (SECT/DECT and patient response/lesion response), an individual random forest classifier with 10-fold internal cross-validation was trained on the study cohort and tested on the validation cohort to confirm the predictive performance.

Results We performed manual RECIST 1.1 response analysis on a total of 6533 lesions. Multivariate statistics selected significant features for patient response in SECT (min. brightness, R²=0.112, padj. ≤0.001) and DECT (textural coarseness, R²=0.121, padj. ≤0.001), as well as lesion response in SECT (mean absolute voxel intensity deviation, R²=0.115, padj. ≤0.001) and DECT (iodine uptake metrics, R²≥0.12, padj. ≤0.001). Applying the machine learning models to the validation cohort confirmed the additive predictive power of DECT (patient response AUROC SECT=0.5, DECT=0.75; lesion response AUROC SECT=0.61, DECT=0.85; p<0.001).

Conclusion The new method of DECT-specific radiomic analysis provides a significant additive value over SECT radiomics approaches for response prediction in patients with metastatic melanoma preceding immunotherapy, especially on a lesion-based level. As mixed tumor response is not uncommon in metastatic melanoma, this lends a powerful tool for clinical decision-making and may potentially be an essential step toward individualized medicine.

  • melanoma

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Introduction

Melanoma is a highly aggressive cutaneous tumor which accounts for up to 80% of all skin cancer-related deaths worldwide, making it the most lethal skin malignancy.1 It affects mainly fair-skinned people at median age that ranges from 57 to 62 years; however, it is reported to be one of the most common primary malignancies in women under 30 years.2 There has been a rising incidence of melanoma cases worldwide in the last decades, giving a reason for great concern and continuous clinical research.3 In the localized stage, melanoma is treatable with surgery and has a reasonably high 5-year survival rate of up to 98%.4 Unfortunately, melanoma tends to metastasize, and surgical approaches are often no longer sufficient.5 Furthermore, patients with metastatic disease have a poor prognosis with 5-year survival under radiation and conventional chemotherapy ranging from 10% to 50%.6 In the last decade, novel immunotherapy approaches have significantly extended overall survival (OS) rates in metastatic melanoma.7 Unfortunately, though, heterogeneous response to immunotherapy is not uncommon, with some lesions in one patient showing good responsiveness while others may even be further progressive.8 The high mortality in the advanced stages and costs due to the complexity of treatment and continued diagnostics further add to the burden the tumor places on public healthcare.9 This has sparked many research projects worldwide to evaluate predictive biomarkers in patients with metastatic melanoma.10–12 In medical imaging, dual-energy CT (DECT) has been shown to outperform single-energy CT (SECT) because its superior exploitation of spectral information for diagnostic purposes may improve the visualization of biological processes.13 More recently, the growing field of radiomic analysis, a process of turning medical images into mineable data, has opened up new approaches to identify predictive biomarkers.14 15 To the best of our knowledge, there have not been any attempts to predict responsiveness in patients with metastatic melanoma using the new method of dual-energy specific radiomic analysis.

Therefore, the scope of our study was to assess the additive value of DECT over SECT to radiomic-based response prediction in patients with metastatic melanoma undergoing immunotherapy. We hypothesize, that the inherent material decomposition capabilities of DECT can provide important additional information by visualizing physiological biomarkers.

Methods

Study population and therapy scheme

From January 1, 2015 to October 1, 2019, we consecutively included patients with the initial diagnosis of melanoma, for whom a whole-body DECT tumor load assessment revealed tumor stage IV and who were subsequently treated with immune checkpoint inhibitors (anti-CTLA-4 [cytotoxic T-lymphocyte-associated Protein], anti-PD1 [programmed cell death protein 1], or combination anti-CTLA-4/anti-PD1). From the patient’s clinical reports, we collected their mutation status (v-Raf murine sarcoma viral oncogene homolog B [BRAF], proto-oncogene c-KIT [cKIT], neuroblastoma RAS viral (v-ras) oncogene homolog [NRAS]) and lactate dehydrogenase (LDH) and S-100β levels at stage IV diagnosis. We determined progression-free survival (PFS, stage IV until further progression) and (OS, stage IV until death/end of study time frame) in concordance to the patient’s clinical reports, as well as the institute’s tumor board reviews.

Image acquisition and reconstruction parameters

All the whole body DECT examinations that initially revealed stage IV melanoma were contrast-enhanced (Imeron 400, Bracco, Milan, Italy) and performed on the same third generation dual-source CT scanner (SOMATOM Force; Siemens Healthineers, Forchheim, Germany). Contrast agent (patients’ bodyweight in kg +15 = contrast agent in mL), as well as a subsequent saline flush (40 mL), were administered through a peripheral vein cannula by a double syringe power injector (Medrad; Bayer, Leverkusen, Germany) at a flowrate of 2 mL/s. Image acquisition took place in a portal venous phase (90 s after the application). Attenuation-based tube current modulation (CARE Dose4D, reference mAs 190) was activated for the examination. Tube voltage was set to 100/Sn150 (tube A 100 kV, tube B tin-filtered 150 kV). Collimation was set to 0.6×192/128 mm, pitch was 0.6, and gantry rotation time 0.5 s. The composite CT images were reconstructed in a medium soft kernel (Bf40d) with a slice thickness of 3 mm. The five DE specific image subtypes (tube A and B, Mixed, VNC, Iodine) were reconstructed employing a quantitative kernel without overshoots (Qr40d) at 3 mm slice thickness with iodine-enabled iterative beam hardening correction to allow for accurate three-material decomposition. DECT image acquisition produces a composite image (40% tube A / 60% tube B) that visually closely resembles SECT images at 120 kV. As opposed to SECT imaging that allows for visual interpretation only, the material decomposition properties of DECT can furthermore be used to quantify iodine and fat concentration (‘VNC’ and ‘Iodine’ images).16

Tumor load assessment and classification of response

Initial tumor load and response under immunotherapy were assessed according to the response evaluation criteria in solid tumours, ver. 1.1 (RECIST 1.1)17 using the commercially available software solution Mint.Lesion (Mint Medical, 69121, Heidelberg, Germany). Each included patient received quarterly whole-body CT (WBCT) follow-up staging examinations to evaluate tumor response under immunotherapy. For tumor load assessment and response classification, we included every follow-up from January 1, 2015 to January 1, 2021 and focused on best response in the given time frame to account for long-term responsiveness.

The patient response was determined in concordance to the patient’s clinical reports and the institutional tumor board reviews. To facilitate comparison, we classified patients with complete response (CR), partial response (PR), and stable disease (SD) as ‘responder patients’, while patients with progressive disease (PD) were classified as ‘non-responder patients’.

For lesion response, we assessed each lesion individually analogous to RECIST 1.1. Lesions with CR (disappearence), PR (≥30% shrinkage compared with baseline long axis diameter), and SD (no sufficient change compared with smallest long axis diameter (‘nadir’)) were classified as ‘responder lesions’, and lesions with PD (≥20% increase compared with nadir) as ‘non-responder lesions’. Figure 1 is a flow chart of the patient and lesion classification for radiomic analysis.

Figure 1

Flow chart of patient and lesion classification for comparisons and radiomic analysis.

Computation and extraction of radiomic features

After classification of responsiveness, the initial DECT scans that had revealed stage IV melanoma were exported into a specialized tumor segmentation prototype software (eXamine V.2.0.50636.0, Siemens Healthineers, Forchheim, Germany). Each lesion was assessed again using semi-automatic volume-of-interest (VOI) segmentation. For these VOIs, the software computed regular radiomic features from the composite image and DE-specific features for each of the 5 DE image subtypes (tube A, tube B, Mixed, VNC, Iodine) (see figure 1).

Statistical analysis, machine learning training and application

Statistical analysis of patient data and creation of graphs was performed using GraphPad Prism V.9.0.2 for Windows (GraphPad Software, San Diego, California USA). For statistical analysis of the extracted DECT features, a machine learning capable prototype software (Radiomics V.1.2.1, Siemens Healthineers, Forchheim, Germany) was used, that is based on the open-source python package ‘PyRadiomics’.18 19 The discriminative performance of individual features was analyzed using univariate statistics with determination coefficient (R²), mutual information (MI), and Bonferroni correction for multiple comparisons. A corrected p<0.05 was considered statistically significant. Multiple logistic regression was used to analyze the performance of radiomic feature combinations. From the statistically significant features, redundant and irrelevant features were excluded using a minimum redundancy maximum relevance algorithm. A stepwise forward selection of the remaining features was used to identify the best subset in accordance to the Akaike information criterion (AIC). The Bayesian information criterion (BIC) and the area under the receiver operating characteristic curve (AUROC) for the best subset were computed. AUROC ranges from 0.5 to 0.6 were considered as indicative for at most coincidental classification accuracy, from 0.6 to 0.7 for poor, from 0.7 to 0.8 for fair, from 0.8 to 0.9 for good, and from 0.9 to 1.0 for excellent classification accuracy. Odds Ratio (OR) with 95% CIs were computed to describe the likelihood of patients and lesions with the selected subsets to be responders/non-responders.

To confirm the best subset’s predictive performance, a random forest classifier was trained using a 10-fold cross-validation with 100 trees and the split quality measure of the Gini impurity as a quality criterion. The machine learning model was applied to the validation cohort for verification. We computed sensitivity, specificity, positive and negative predictive value, and overall accuracy of prediction from the model performance report. The AUROC of the prediction was computed to compare patient and lesion response prediction.

Results

Study population, tumor load and survival data, response

From January 1, 2015 to October 1, 2019, we included 140 consecutive patients (61 female, 63±16 years) with melanoma for whom a DECT tumor load assessment revealed stage IV and who were subsequently treated with immunotherapy (first half study cohort, second half validation cohort). Both cohorts accounted for 774 lesions (5.5±6.0 per patient). We assessed 1291 WBCT (9.2±6.1 per patient) and performed manual response analysis analogous to RECIST 1.1 on 6533 lesions (46.7.4±63.0 per patient). See table 1 for further details about our study population.

Table 1

Study population, baseline tumor load and survival data, response

Radiomic analysis

Exemplary segmented responder and non-responder lesions and feature analysis in two different patients are shown in figure 2.

Figure 2

Segmentation and feature analysis in the prototype software.

Statistical patient response prediction (study cohort)

Patient response

Based on radiomic features available in SECT, the minimum brightness of all lesions (‘original_firstorder_Minimum’) was selected by univariate analysis for multivariate statistics (AUROC=0.67, MI=0.077, R²=0.112, padj. ≤0.001). Multiple logistic regression showed a significant contribution of the selected feature towards patient response classification (p<0.001, adj. R²=0.109, AIC=531.3, BIC=539.3, AUROC=0.67). Patients with higher minimum brightness levels were 2.01 times as likely (OR=2.01, 95% CI 1.63-2.48, p≤0.001) of being non-responders than those with lower averaged minimum brightness levels.

For patient response with DECT radiomics, univariate analysis selected a feature derived from the high energy dataset (tube B) that describes the coarseness of structural textures (‘wavelet-HLH-glrlm_LongRunEmphasis’) of all segmented lesions (AUROC=0.71, MI=0.064, R²=0.121, padj. ≤0.001). Multivariate statistics confirmed this selection and showed the feature to be a significant contributor to patient response classification (p<0.001, adj. R²=0.118, AIC=527.3, BIC=553.2, AUROC=0.70). Overall, patients where all lesions had a higher structural heterogeneity were 1.7 times (OR=1.7, 95% CI 2.66-3.59, p≤0.001) less likely of being non-responders than patients with higher lesion homogeneity.

Lesion response

For individual lesion response based on SECT radiomics, univariate analysis selected a feature (‘original_firstorder_MeanAbsoluteDeviation’) that describes the mean distance of all voxel intensity values from the mean value of the image array (AUROC=0.67, MI=0.088, R²=0.115, padj. ≤0.001). Multiple logistic regression again confirmed this features contribution to lesion response classification (p<0.001, adj. R²=0.113, AIC=495.8, BIC=503.6, AUROC=0.67). Lesions with higher voxel brightness homogeneity were 2.6 times more likely (OR=2.6, 95% CI 1.83-3.57, p≤0.001) of non-response than such with higher mean deviance levels.

For lesion response based on additional DECT radiomics, univariate statistics revealed a total of 10 features to be relevant (AUROC ≥0.77, MI≥0.1 R²≥0.12, padj. ≤0.001). The ensuing step-forward selection in multivariate analysis selected a best subset comprised of three dual-energy specific features (MeanIodine, IodineConcentrationTotal, MeanMixed) to have the greatest contribution to lesion response classification (p<0.001, adj. R²=0.382, AIC=392.4, BIC=408.3, AUROC=0.88). Overall, individual lesions with this radiomics signature were 3.3 times more likely (OR=3.3, 95% CI 2.7-4.3, p≤0.001) of non-response than such lesions without the radiomics signature. Figure 3 visualizes the most prominently selected features in univariate and multivariate statistics. The blue background marks the difference between the respective means of responders and non-responders.

Figure 3

Selected features in univariate and multivariate statistics.

Machine learning training (study cohort)

In 10-fold internal cross-validation, the model based on the best subset available in SECT (minimum brightness) had only coincidental classification capabilities (AUROC=0.51) with a sensitivity of 61.17% (95% CI 54.14% to 67.89%) and a specificity of 50.53% (95% CI 43.19% to 57.84%). The model trained on the best DECT radiomics subset for patient response (textural coarseness) also showed only coincidental classification capabilities in 10-fold internal cross-validation (AUROC=0.55) with a sensitivity of 64.08% (95% CI 57.12% to 70.63%) and a specificity of 50.53% (95% CI 43.19% to 57.84%).

For individual lesion response, the model based on the best SECT subset (mean absolute voxel brightness deviation) performed poorly (AUROC=0.61) in 10-fold internal cross-validation. The model’s sensitivity was 65.95% (95% CI 58.63% to 72.74%), and the specificity was 59.46% (95% CI 52.01% to 66.06%). The model trained on DECT radiomics (iodine uptake metrics), on the other hand, performed good in 10-fold internal cross-validation (AUROC=0.81) with a sensitivity of 66.67% (95% CI 59.69% to 73.14%) and a specificity of 78.46% (95% CI 72.02% to 84.01%).

Machine learning application (validation cohort)

Machine learning application and predictive performance calculation in the validation cohort based on the best radiomic subsets showed coincidental accuracy at most for patient response classification by SECT radiomics (AUROC=0.50). With fair response prediction capabilities (AUROC=0.75), DECT radiomics performed significantly better on a patient-based level (p<0.001). The best SECT radiomics subset for lesion response had poor predictive capabilities (AUROC=0.61). With a good predictive accuracy (AUROC=0.85), the model for individual lesion response trained on DECT radiomics, again performed significantly better in prediction (p<0.001). See figure 4 for further details and metrics.

Figure 4

Machine learning application validation cohort AUROC and metrics. AUROC, area under the receiver operating characteristic curve; DECT, dual-energy CT; SECT, single-energy CT.

Discussion/conclusion

Immune checkpoint inhibitors have revolutionized the realm of melanoma treatment and prolonged PFS and OS.7 However, mixed response to immunotherapy is not uncommon, with some lesions showing complete responsiveness, while others remain stable or are even further progressive under therapy.8 In this study, we identified dual-energy specific radiomic features linked to higher likelihoods of immunotherapy responsiveness on a patient-based level and a lesion-based level.

Several attempts have been made so far to identify predictors for therapy response in patients with metastatic melanoma. Hamberg et al evaluated serum LDH and the tumor marker S-100β for their predictive performance, particularly the latter showing great promise.20 As for CT imaging, several studies proposed using diameter change in a 3-month follow-up window.21 22 While these are important metrics, our scope was to identify predictors visible in baseline DECT for patient and lesion response. This is of particular interest for medical research, as metastatic melanoma in the setting of immunotherapy may show different response patterns that may even complicate classical tumor burden evaluation strategies.23

Radiomic signatures have shown great potential for response prediction in solid tumors. Trebeschi et al reported associations between radiomic features and immunotherapy response in patients with metastatic non-small-cell lung cancer and melanoma.24 In their study, however, response prediction of individual melanoma metastases was challenging. As an approach to tackle this problem, Sun et al proposed a radiomic signature for CD8+ tumor infiltration as a response predictor with promising results.25 Likewise, Ligero et al were able to accurately distinguish response versus non-response in patients with several different primary tumor types (including melanoma) at a sensitivity of ≥73% and a specificity of ≥55%0.26 However, the number of patients with melanoma in their cohort was relatively small (n=9). Furthermore, Ligero et al analyzed radiomic features derived from SECT. While machine learning in our study was only capable of coincidental patient response prediction in SECT, training a model on DECT radiomics showed a clear benefit. Lesion response prediction performed only little better in SECT than patient response prediction. Using a machine learning model trained on DECT radiomics, though, had significantly improved lesion response prediction. DECT allows for iodine and fat differentiation by the material decomposition principle, adding important clinical data to otherwise only visually evaluable CT images.16 Doda Khera et al reported DECT radiomics to possess a precise differentiation capability of normal liver tissue vs steatosis or cirrhosis.27 Bae et al showed textural features from iodine overlay maps to correlate with tumor aggressiveness in patients with lung adenocarcinoma.28 Concordantly, Choe et al demonstrated textural features derived from iodine overlay maps to reflect perfusion levels associated with higher hazard ratios in patients with resectable lung cancer.29 To the best of our knowledge, the novel method of dual-energy radiomic analysis has never been investigated for melanoma response prediction in WBCT. Considering mixed responses under immunotherapy, this approach can be a powerful additional tool for oncological decision making. A highly specific lesion response prediction might, after all, help with the early identification of metastases that require different therapeutic methods. As El Naqa et al suggested, radiomic features are highly dependent on image acquisition and reconstruction parameters, emphasizing the need for standardized CT protocols.30 When considering these essential caveats, training machine learning models using the novel method of DECT radiomics can potentially be an important step towards individualized treatment response assessment.

Limitations

This study has several limitations. First, the design of this study was retrospective, which in our case limited inclusion. Furthermore, in this study, only baseline lesions were used for radiomic analysis. However, patients with CR of the baseline lesions might still have been categorized as non-responders due to newly emerging lesions under therapy. A larger patient collective with a cleaner distinction of subgroups might have improved the discriminatory power of dual-energy radiomics even more. Second, although our model’s response prediction was reasonably accurate, image data analysis lends a rather descriptive approach to yet not fully understood contexts. Previous studies have pointed out a significant correlation between iodine uptake in lung cancer and markers for metabolism.29 31 In circulating melanoma cell subpopulations, on the other hand, cellular heterogeneity was found to be a significant response predictor.32 Consecutively, this study raises the question if therapy response correlates with the overall metabolic activity of lesions or if lesions with higher iodine uptake are comprised of a higher number of active cells, which, therefore, have a higher likelihood of cellular heterogeneity. Our study lacked the necessary histopathological validation to illuminate this potential relationship. Third, our results are based on iodine uptake in a portal venous phase, which may be influenced by several factors like the patient’s age, sex, body weight, and cardiac function. Furthermore, radiomic features are highly dependent on image acquisition and reconstruction. We used a high-end third generation dual-source scanner that is not readily available at every clinical site. Our results may, therefore, not necessarily be accurately reproducible with other setups.

Conclusion

Machine learning models trained using the novel method of DECT-specific radiomics provide a significant additive value over SECT radiomics approaches for response prediction in patients with metastatic melanoma preceding immunotherapy, especially on a lesion-based level. As mixed tumor response is not uncommon in metastatic melanoma, this lends a powerful tool for clinical decision-making and may potentially be an essential step towards individualized medicine.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication

Ethics approval

Institutional Review Board approval was obtained from the University of Tuebingen prior to the initiation of this study.

References

Footnotes

  • Twitter @TeresaSAmaral

  • Contributors ASB: conceptualization, methodology, investigation, formal analysis, data curation, writing-original draft, visualization. FP: data curation, writing-review and editing. HA: data curation, writing-review and editing. SA: data curation, writing-review and editing. TE: conceptualization, funding acquisition, supervision, writing-review and editing. TA: resources, writing-review and editing. SF: methodology, software, validation, writing-review and editing. AFC: methodology, software, validation, writing-review and editing. KN: conceptualization, resources, writingreview and editing. AEO: conceptualization, funding acquisition, writing-review and editing, project administration, guarantor.

  • Funding This study was partially funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, 'DFG'), project number #428216905.

  • Competing interests All authors declare no conflict of interest for this study. SF and AFC are employees of Siemens Healthcare and had no control over the data.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.