Article Text
Abstract
Twenty years after its initial introduction, Response Evaluation Criteria in Solid Tumors (RECIST) remains today a unique standardized tool allowing uniform objective evaluation of response in solid tumors in clinical trials across different treatment indications. Several attempts have been made to update or replace RECIST, but none have realized the general traction or uptake seen with RECIST. This communication provides an overview of some challenges faced by RECIST in the rapidly changing oncology landscape, including the incorporation of PET with 18F-fluorodeoxyglucose tracer as a tool for response assessment and the validation of criteria for use in trials involving immunotherapeutics. The latter has mainly been slow due to lack of data sharing. Work is ongoing to try to address this.We also aim to share our view as statistician representatives on the RECIST Working Group on what would be needed to validate new imaging endpoints for clinical trial use, with a specific focus on RECIST. Whether this could lead to an update of RECIST or replace RECIST altogether, depends on the changes being proposed. The ultimate goal remains to have a well defined, repeatable, confirmable and objective standard as provided by RECIST today.
- Biomarkers, Tumor
- Biostatistics
- Clinical Trials as Topic
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Introduction
Twenty years after its initial publication, Response Evaluation Criteria in Solid Tumors (RECIST) remains a unique standardized tool allowing uniform objective evaluation of response in solid tumors across different treatment indications.1 2 This is quite an achievement when considering on the one hand the heterogeneous behavior of solid tumors, and on the other hand, the oncology landscape being in constant development. The last few years have seen newly defined disease settings such as oligometastatic disease; treatment modalities with atypical response patterns, such as immunotherapeutics; new or improved imaging techniques, including new tracers for PET imaging, and new technologies attempting to identify early (lack of) response to treatment, such as ctDNA-based approaches. Many attempts have been made to update or replace RECIST, for instance, in disease-specific settings such as lymphoma,3 4 brain metastases,5 and prostate cancer.6 Still, none have realized the general traction or uptake seen with RECIST.
As statistician representatives on the RECIST Working Group, we are or have been involved in trying to address several questions regarding the role of RECIST in this rapidly changing field. The initial criteria were validated on a warehouse comprizing studies in which patients were treated with chemotherapy. When targeted agents became available, a major question was whether RECIST would be able to assess the response of solid tumors to these drugs with the same precision. Simply put, these drugs typically block the growth and spread of cancer by targeting specific molecules that regulate the growth and spread of cancer. As a result, their mode of action differs fundamentally from that of classical cytotoxic agents. An extensive analysis of a database pooling individual patient data from 50 clinical trials investigating targeted agents either alone or in combination with classical chemotherapy in different tumor types was performed. It did not show any relevant difference in the performance of RECIST 1.1 for these agents compared with chemotherapeutic medicines.7
Another major challenge the RECIST Working Group has faced in the last ten years has been the incorporation of Positron Emission Tomography (PET) with fluorodeoxyglucose (18F-FDG) tracer beyond its current role of complementing CT scanning to confirm disease progression. 18F-FDG-PET, in practice, has shown itself to be useful in assessing within-patient changes.8 9 It is also used to illustrate the activity, or lack thereof, of new pharmaceutical compounds. Due to a lack of generally accepted standard guidelines, results from different clinical trials have been difficult to compare and the role of PET-CT in a set of standard response criteria remains unclear. To try to integrate this tool into RECIST, we collected data from 18 trials with information on approximately 1000 patients. Of those, only 30% had a sufficient level of information to allow a detailed comparison of baseline and consecutive follow-up imaging data captured based on simultaneous CT and PET scans.10 The lack of harmonization of 18F-FDG-PET protocols, in terms of imaging parameters and endpoints, across the different studies is a major obstacle obtaining a sufficient level of evidence for 18F-FDG-PET to be incorporated in RECIST.
More recently, the development of immunotherapeutics have brought to light atypical response patterns, whereby in a small subset of patients an initial progression according to RECIST was followed by a late, but durable response. This observation resulted in the publication of several criteria for response assessment criteria, such as the immune-related response criteria (irRC, and its simplifications) and irRECIST.11–13 As none have been systematically implemented, results of different trials have been difficult to interpret. This prompted the RECIST Working Group in collaboration with representatives from pharmaceutical industry to publish a set of guidelines for the implementation and data collection of response assessment in trials with immunotherapy, the so-called iRECIST criteria.14 Work is ongoing to validate these criteria in a centralized individual patients database.
Work on other important questions, such as recommendations adapted to pediatric oncology, or a RECIST-like approach to response assessment in brain metastases is currently ongoing. In these endeavors, we face challenges, with one of the major challenges being the availability of, and sometimes also the willingness to share clinical and/or imaging data.
In 2009, Sargent et al provided guidance on relevant criteria for validating new endpoints for use in cancer clinical trials.15 They specifically focused on imaging endpoints for phase II trials as this is the area in clinical trial research that could benefit the most from having an early imaging biomarker for evaluating new treatments. In summary, these authors postulated that a new endpoint should be accompanied by a sound biological rationale, a standardized protocol for performing the imaging and interpreting imaging measurements, an understanding of its limitations, and evidence of a correlation with a true patient benefit endpoint (aka surrogacy). All these conditions remain valid today. Along these lines, O’Connor et al provided a roadmap for establishing imaging-based biomarkers for screening, diagnosing and staging, patient stratification, and/or early endpoints of efficacy or lack of activity in the context of clinical research and ultimately patient care.16 Yet there has not been a major update of RECIST incorporating any new imaging technique or imaging-based assessment since the publication of V.1.1.
In this communication, we aim to share our view on the major ingredients needed for validating imaging endpoints for use in clinical trials, with specific attention to RECIST. This could consist of anything ranging from integrating a new approach into the ruleset to replacing CT-based rules for RECIST.
Repeatability,reproducibility and generalizability
In 2007, a so-called coffee break experiment was reported, where 33 patients with non-small cell lung cancer (NSCLC) underwent two consecutive chest CT scans approximately 15 min apart.17 These analyses suggested that changes less than 10% could result from variability inherent to imaging, but measurement noise up to 20% was observed in smaller lesions. This analysis supported the RECIST 1.1 rule requiring an absolute increase of at least 5 mm for a 20% increase to be classified as progression.2 In addition, iRECIST requires an increase of at least 5 mm on top of the initial progression for an assessment to be classified as confirmed progressive disease.14
The purpose of this experiment was to assess the variability of tumor measurements under repeat scans to understand better what constitutes a real change in tumor measurements as used to determine response and progression. It is an example of the kind of information that would be needed to support the integration of a new-imaging technique based endpoint into RECIST.
Along these lines, the 18F-FDG-PET RECIST working group decided to revisit the test-retest repeatability of quantitative 18F-FDG-PET measurements. This allowed them to formulate recommendations for assessing minimal detectable changes and to investigate how these would change for different tumor types, lesion locations, image acquisition methods, and single-center versus multicenter settings using a meta-analysis involving data from eleven studies identified in the literature.8 They concluded that in a multicenter study, using SUVpeak is recommended and that a decrease in SUVpeak by 30% in a lesion with a baseline of SUVpeak ≥4.0 represents a demonstrable biological change. Together with guidelines such as the ones provided by the European Association of Nuclear Medicine18 for performing, interpreting, and reporting results of 18F-FDG-PET/CT, this can support the collection of a standardized dataset that may serve to investigate whether there is a role of 18F-FDG-PET for response assessment in RECIST.
Interestingly, while this question remains unaddressed today from the RECIST perspective, several 18F-FDG-PET-based response criteria have been proposed to evaluate immunotherapy response.19 20 PECRIT (PET/CT Criteria for Early Prediction of Response to Immune Checkpoint Inhibitor Therapy) was proposed based on assessing a cohort of 20 patients with advanced melanoma treated with ipilimumab or nivolumab. It includes criteria based on a change in SULpeak (Standard Uptake value normalized by Lean body mass).21 The PET Response Evaluation Criteria for Immunotherapy classification includes considerations on the absolute number of new lesions on 18F-FDG-PET scan based on analysis of 41 patients with advanced melanoma treated with ipilimumab.22 iPERCIST, an adaptation of the PERCIST criteria23 for immunotherapy, describing an intermediate response assessment based on a dual-time-point evaluation, was proposed based on an analysis of 28 patients with NSCLC treated with nivolumab.24 Others proposed imPERCIST as a modification of PERCIST where progressive disease is not defined based on the development of new lesions but rather an increase in the sum of SULpeak by 30%, based on an analysis of 60 patients with advanced melanoma treated with ipilimumab.25 This has recently prompted a joint guideline standardizing the use and interpretation of 18F-FDG-PET/CT during immunotherapy.26
Note that several PET studies have explored the role of molecular imaging of immune checkpoint molecules, such as PD-L1,27 PD-1,28 CTLA-4,29 and LAG-3,30 and tracers targeting CD8.31 Preliminary data are of interest, but these studies will require testing in larger studies to define their role in response prediction. An important question however with these developments will be to which extent RECIST will be able to, or should, incorporate and thereby accommodate for all possible protocol specificities. This may deviate the tool from its initial purpose of standardizing response assessment in solid tumors in clinical trials across different treatment indications.
The previous overview also demonstrates the urgent need for a large-scale validation to provide a standardized approach for imaging in clinical trials. Evidence should extend beyond small, single-center exercises, and while these are very informative, they can hardly be considered generalizable. Modifications to RECIST should be supported by data from multiple trials, ideally considering multiple tumor types and treatments with different modes of action.
RECIST V.1.0 was developed by interrogating data from more than 4000 patients from 14 clinical trials and was one of the first initiatives showing the power of data sharing.1 RECIST V.1.1 was supported by analyses of individual patient data from approximately 10,000 patients in 16 chemotherapy trials.2 The targeted agents analysis was performed on a warehouse pooling data from 50 clinical trials on 23,000 patients.7 Nowadays, sharing has become much more difficult, especially in the context of trials investigating immunotherapeutic agents. First, clinical measurements have their limitations, i.e. they can only be used to the extent of what is reported. RECIST V.1.1 based measurements are unidimensional, with a maximum of five lesions per patient, documented only until progression (according to RECIST). This does not allow much room for investigating response patterns after RECIST progression. Ideally, the underlying images would be available to enable more detailed assessments; however, sharing of images (and clinical trial data) has become very difficult partly due to stringent privacy regulations such as the General Data Protection Regulation in Europe that became effective as of May 2018. Finally, some sponsors are concerned about the impact a reinterpretation may have on approved treatment.
This has heavily hampered our efforts to centralize a data warehouse on patients treated with immunotherapeutics, despite the publication14 of standardizing guidelines for clinical trials involving such agents, endorsed by several major pharmaceutical players in the field. So far, analyses comprizing multiple trials with immunotherapeutics are limited. In 2020 the Food and Drug Administration (FDA) published the results of a pooled analysis of 14 randomized controlled trials submitted to the organization for registration purpose.32 However, this analysis is limited because a large proportion of patients had no measurements beyond RECIST progression. For a true validation exercise, access to individual patient data with follow-up beyond RECIST progression, as suggested by iRECIST, is crucial. Without measurements beyond RECIST progression, it is difficult to investigate the real prevalence of pseudoprogression and its impact on the assessment of activity of immunotherapeutic compounds.
Fortunately, some solutions can build on the general perception that data sharing precludes unnecessary exposure of patients to irrelevant imaging and treatment, and speed up new discoveries. There are several data-sharing initiatives currently active, as nicely summarized by Vazquez et al.33 They are mostly US based (eg, clinicalstudydatarequest.com, NCTN/NCORP Data Archive, Project Data Sphere, Vivli), they only contain data on the control arm (eg, Project Data Sphere), or data are only accessible via built-in analysis tools (eg, clinicalstudydatarequest.com, Project Data Sphere). The latter prevents effective pooling of individual patient data from different sources or platforms, a pre-requisite for performing validations for RECIST. Imaging archives in oncology are less frequent. The Cancer Imaging Archive (TCIA), sponsored by the National Cancer Institute, hosts a large publicly available archive of medical images of cancer. EuCanImage is a 4-year research project aiming to build a European imaging platform to enhance research in artificial intelligence (https://eucanimage.eu/). Initiatives supporting federated data sharing approaches such as the Joint Imaging Platform (https://jip.dktk.dkfz.de/jiphomepage/) open the way for a potentially more radiomics approach to RECIST. Automatic segmentation could allow tumor load to be measured more reproducibly, without the need to limit to a total number of lesions per organ site. This creates opportunities for a more general use of volumetric assessments to feed tumor growth models in early drug development, as proposed by Maitland et al,34 but also for the evaluation of imaging signatures that can help provide an early readout of response such as the one proposed by Dercle et al for patients with melanoma treated with immunotherapy.35 Federated approaches for statistical individual patient data analysis are less straightforward. Although some software packages are available for standard statistical analyses, survival models such as the Cox model do not lend themselves well to federated learning.36 As survival analysis models are indispensable in oncology research, this will be an important topic for future research. This approach may however not be less resource intensive as classical data sharing, as this will require some IT involvement as well as data preparation on all ends to ensure data harmonization for analysis.
The RECIST way forward
The concepts and rules explained in Sargent et al15 remain applicable. As they apply to bring changes to RECIST, the ruleset sets a high bar. In what follows, we will discuss variations in how RECIST could be refined or improved.
First, we need to recognize that today’s application of RECIST covers a wide range of settings, and it is possible (although undesirable because causing divergence) that changes are applied depending on the purpose of use. Indeed, whereas initially intended for use in phase II clinical trials with response to treatment as the primary endpoint, today RECIST is used in early Phase trials as a treatment activity indicator, in comparative trials with response endpoint, and in comparative trials with progression endpoint (progression-free survival (PFS), time to progression, other time to event endpoints using RECIST progression as a component).
As a way to go about the future of RECIST, potential changes to the RECIST construct could be categorized as follows. RECIST could be updated by adding new techniques or methodology to what RECIST currently is. Modern imaging techniques such as PET-CT with different tracers could be incorporated provided measurement error can be controlled via standardized protocols as mentioned previously. New technology such as those based on changes in ctDNA could be included as an early tool to monitor response to treatment. Initiatives such as the ctDNA to Monitor Treatment Response (ctMoniTR) project by the Friends of Cancer Research (https://friendsofcancerresearch.org/ctdna/) could pave the way for a validation that could lead to integration in RECIST.
RECIST can be evaluated for use in other settings than the ones considered today, as already illustrated by the analyses on targeted agents and the attempts to validate iRECIST for assessing response to immunotherapeutic agents. Today, another area of research for the Working Group is the role of RECIST to assess response to treatment in patients with brain metastases. For this project, the Working Group has joined forces with the RANO Brain metastases group to centralize data from a large number of clinical trials of brain metastases.
RECIST could be updated to include new definitions of response or progression, such as pseudo progression or hyperprogression (as sometimes seen in patients treated with immunotherapy37) or minor responses. New classifications of RECIST could be proposed to capture events such as clinical benefit or clarify the definition of PFS.38
Finally, RECIST could be replaced with a whole new method, satisfying the conditions specified by Sargent et al and O’Conner et al.15 16 In a world where the treatment landscape is in a constant mode of change, and technology to monitor patient response to treatment is changing rapidly, RECIST must evolve as well to remain the well defined, repeatable, confirmable, and objective standard it is today.
Ethics statements
Patient consent for publication
References
Footnotes
Contributors Both authors contributed equally to the content of this manuscript.
Funding This publication was supported by a donation from Kom op tegen Kanker (Stand up to Cancer), the Flemish cancer society from Belgium.
Competing interests None declared.
Provenance and peer review Commissioned; externally peer reviewed.