Article Text
Abstract
Objective We developed a validated index for assessing histological disease activity in UC and established its responsiveness.
Methods Two hundred biopsies were scored. The outcome was the Global Visual Evaluation (GVE). Eight histological features were tested. The Nancy index was developed by multiple linear regression and bootstrap process to create an index that best matched the GVE. Goodness of fit was assessed by the adjusted R squared (adjusted R2). The second step was the validation of the index: 100 biopsies were scored for the Nancy index by three pathologists from different centres. Inter-reader reliability was evaluated for each reader. The relationship between the change of the Nancy index and the Geboes index was assessed to assess the responsiveness.
Results After backward selection with bootstrap validation, 3/8 items were selected: ulceration (adjusted R2=0.55), acute inflammatory infiltrate (adjusted R2=0.88) and chronic inflammatory infiltrate (adjusted R2=0.79). The Nancy index is defined by a 5-level classification ranging from grade 0 (absence of significant histological disease activity) to grade 4 (severely active disease). The intraclass correlation coefficient (ICC) for the intrareader reliability was 0.88 (95% CI 0.82 to 0.92) and the index had good inter-reader reliability (ICC=0.86 (0.81 to 0.99)). The correlation between the Nancy index and the Geboes score or the GVE was very good. The index had a good responsiveness with a high correlation between changes in the Geboes score and changes in the Nancy index (0.910 (0.813 to 0.955)).
Conclusions A three descriptor histological index has been validated for use in clinical practice and clinical trials.
- ULCERATIVE COLITIS
- MUCOSAL PATHOLOGY
Statistics from Altmetric.com
Significance of this study
What is already known on this subject?
Histological activity of disease in UC has prognostic value.1
A validated, responsive and consistent scoring system is needed for international use.
No such scoring system currently exists.
What are the new findings?
A simple three descriptor scoring system for assessing histological disease activity in UC has been created and validated.
The Nancy index shows good intraobserver and interobserver reliability.
The Nancy index shows good responsiveness.
How might it impact on clinical practice in the foreseeable future?
The Nancy index has the potential to become a universal tool for clinical practice and clinical trials.
Introduction
UC presents as continuous colonic mucosal inflammation, starting at the rectum and extending proximally.2–4 Accumulating evidence indicates that histological healing is associated with better clinical outcomes in UC.1 ,5 ,6 It could represent the ultimate therapeutic goal in UC.7 ,8 Assessment of disease activity and severity is important for designing an optimal therapeutic strategy and follow-up of patients with UC in clinical practice and clinical trials.9–11 No less than 22 indices for the assessment of histological disease activity have been developed, but none have been fully validated.5 ,12–15 They combine chronic and acute changes, as well as epithelial and inflammatory features.16 Mucosal inflammation is usually graded by means of a scale, composed of different features that are thought to be sensitive in characterising disease activity.17 ,18 The two indices most commonly used for the histological assessment of UC are the Riley index and the Geboes index.5 ,13 The Riley index evaluates six features, each of which is graded subjectively on a 4-point scale, and given equal weight, so its operating characteristics have not been examined. The Geboes index includes five features and is the best validated (interobserver variability 0.59–0.70, indicating moderate to good reliability between three expert pathologists), but was not designed to be responsive to change. Other indices have been developed: for Gramlich et al15 and Gupta et al14 the term ‘activity’ refers to an infiltrate of neutrophils into the crypt epithelium but none were developed through a formal validation process, so their operating properties remain poorly understood.19 Nevertheless, we have shown that correlation between these indices is strong.20
Our aim was to develop and formally validate a new index for the histological assessment of disease activity in UC and to evaluate its responsiveness.
Methods
Phase 1: The Nancy index development
The first step was to create an index through a retrospective study. In the absence of a gold standard reference point for the histological assessment of disease activity in UC we used a Global Visual Evaluation (GVE).20 ,21 The GVE used a 10-point visual analogue scale ranging from 0 (minimal activity) to 10 (maximal activity). Selection of the candidate histological items was based on a preliminary work which studied the intrareader and inter-reader variability for five widely used histological UC activity indices (Geboes index, Riley index, Gramlich index, Gupta index and GVE).20 Defined items with good intrareader reliability and inter-reader reliability (κ value >0.6) were selected:22 these items were ‘Chronic inflammatory infiltrate’, ‘Neutrophils in epithelium’, ‘Ulceration’, ‘Acute inflammatory cell infiltrate’, ‘Mucin depletion’ and ‘Neutrophils in lamina propria’. Two other items were also selected, based on expert' opinion and literature review: basal plasmacytosis and serrated architectural abnormalities (table 1).
This study examined biopsies from the first 60 patients of the 2012 Nancy IBD cohort who had a colonoscopy or proctosigmoidoscopy with an established diagnosis of UC. Two hundred biopsies of these 60 patients performed between 2003 and 2014 at the Nancy University hospital were analysed.
The 200 biopsy specimens were scored by one pathologist (AM-B). First, the GVE was scored for the 200 biopsy specimens. In a second time, the eight items were scored and the reading order of the items was randomised using a set of Latin squares. The Nancy index was then constructed that best represented the GVE incorporating a subset of the eight items that could be tested for intrareader reliability.
Phase 2: Validation of the Nancy histological index
Reliability and measurement error
For the analysis of the intrareader reliability, a second separate reading was performed by the pathologist AM-B who scored the subset of items constituting the Nancy index. One hundred biopsies were selected with a simple random sampling among the 200 biopsy specimens and were scored blindly of the first reading. Two months have passed between the two readings to optimise memory extinction.
One hundred anonymised biopsy specimens from 100 patients with UC, (collected between 2009 and 2014) constituted the validation cohort. All biopsy specimens come from the Laboratory of Pathology of Reims University Hospital. In this phase, the subset of items constituting the Nancy index and the GVE were scored by three pathologists from three French centres: CB-R (Centre Hospitalier Universitaire (CHU) of Reims), CB (CHU of Nancy), and VC (Centre Hospitalier Sud Francilien, Corbeil-Essonnes). To ensure homogeneous histological assessment of the biopsy specimens, training was initiated between the central pathologist (AM-B) and the three other pathologists, including teleconferences, presentation and description of all histological parameters. The order of scoring of the items constituting the Nancy index and the Geboes index was randomised using a set of Latin squares. In a second time, the GVE was scored blindly of the items. The pathologists assessed GVE independently from histological features that constitute the Nancy histological index. The time period between assessment of GVE and other histological criteria was minimum 1 month.
Responsiveness
Thirty patients with UC who had mucosal biopsies from two consecutive endoscopic procedures that showed a change in histological activity disease between the two biopsies were retrospectively analysed. The change in histological activity disease was defined by two consecutive endoscopic examinations with at least a difference of 1 point according to the Geboes conversion 9.23 A total of 60 anonymised biopsy specimens from these 30 patients were scored with the Nancy index by the central pathologist (AM-B). The order of reading for the biopsies was randomised. In a third time, the GVE was blindly scored.
Statistics for the development phase
The association between GVE and each item was first investigated using multiple linear regressions. Items were considered as categorical variables, so that the contribution of each level for each item could be explored separately. Each item having k levels was first transformed in k-1 binary variables, with the lower level as reference. Results for these bivariate analyses were expressed as the regression coefficients of each level, corresponding to the mean change in the response variable compared with the reference level. The suitability of bivariate models was assessed by plotting least-squares means and examining residual plots. For each item, the goodness of fit was assessed with the adjusted R squared. A multiple linear regression was performed on all items having a p value <0.1 in bivariate analysis (full model). The full model was then simplified using multiple linear regression with backward selection at the level p=0.05. The stability of the selected model was investigated using the bootstrap resampling method with 1000 replicates.24 Bootstrap resampling is a method to generate replicates of the initial data set for the multivariable analysis. An individual item was kept in the final model if it was selected in at least 70% of these 1000 analyses. When at least one modality was selected by the bootstrap procedure, the corresponding categorical variable was retained for the simplified predictive model. Multiple linear regression was performed using the items selected. The adjusted R squared was computed. The Nancy index was obtained from the coefficients of this multiple linear regression. Simplification of the model was performed in two steps: (1) the regression coefficients of the multiple linear regression were rounded to the nearest integer; (2) some levels were pooled according to their clinical relevance.
The relationship between the Nancy index and GVE was investigated using the Pearson correlation coefficient; a coefficient >0.8 indicates a strong correlation.
Sample size justification is presented as online supplementary material.
Statistics for the validation phase
According to the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist,25 validation of the score was performed on three quality domains: reliability, construct validity and responsiveness. As recommended by COSMIN,25 sample size justification for each step is presented as online supplementary material.
Reliability
Intrareader reliability was assessed from the duplicate readings of 100 biopsy specimens with the computation of the intraclass correlation coefficient (ICC). ICCs were estimated for the Nancy index and for each of the items based on a one-way random effects model using the restricted likelihood method since the resulting ICC may be regarded as the most general statistics for reliability.26 Inter-reader reliability of the Nancy index and its component items were evaluated by the ICC. These ICCs were estimated with a two-way random effects model using the restricted likelihood method. The cluster bootstrap method with 2000 replicates was used to generate associated two-sided 95% CI. According to the terminology suggested by Landis and Koch,22 an intraclass correlation coefficient of 0.6–0.8 shows substantial reliability and 0.8–1.0 is almost perfect reliability. Measurement errors (residuals) and variance components for individual items and the Nancy index are presented to allow for interpretation of the impact of the individual components of the ICC. The Geboes score was transformed with ranks into a continuous parameter with a range from 0 to 30 and ICC was computed on the ranks of the Geboes score.
For the internal consistency as recommended by COSMIN,25 the Cronbach's coefficient α, using partial correlation coefficients, was calculated for the overall Nancy index and for the index with one-at-a-time item deletion to evaluate internal consistency in the Nancy index. Nunally and Bernstein27 suggest 0.70 as an acceptable reliability coefficient; smaller reliability coefficients are seen as inadequate.
Construct validity
For each reader, the relationship between the Nancy index and the GVE or the Geboes score was assessed by computing the Pearson correlation coefficient as recommended by COSMIN.25 This analysis was performed on the ranks of the Geboes score. For this analysis, the GVE and the Geboes score of the most experimented (the longest serving pathologist) reader were considered.
Responsiveness
The index difference between the two biopsies was described by mean and SD according to the Geboes index, the Nancy index and the GVE. According to COSMIN,25 relationships between changes in the Nancy index and changes in other indexes (GVE and Geboes index) were investigated with Pearson's correlation coefficient and the 95% CI was computed thanks to the Fisher's Z-transform. For this analysis, the Geboes score was transformed with ranks into a continuous parameter with a range from 0 to 30.
All analyses were performed using the SAS software V.9.3 (SAS Institute, Cary, North Carolina, USA). The significance level was set at 0.05.
Results
Phase 1: The Nancy histological index development phase
Two hundred biopsy specimens from 60 patients with UC were reviewed. The mean (95% CI) of the GVE was 4.66 (4.23 to 5.09). The description of the GVE according to the 10-point visual analogue scale for each item and each level are shown on figure 1. Some items (serrated architectural abnormalities, chronic inflammatory infiltrate, acute inflammatory cell infiltrate, ulceration or presence of neutrophils in lamina propria) appeared to provide discrimination for GVE, whereas others (basal plasmacytosis, mucin depletion and neutrophils in epithelium) discriminated only at lower levels of the GVE.
Construction of the Nancy histological index
The bivariate analyses demonstrated that all eight candidate items were predictive of GVE. The predicted mean GVE and SD for each level of each item are presented in table 2. The presence of ulceration corresponded to a mean increase in the GVE of 5.31 (SE: 0.35). A mild but unequivocal increase in the chronic inflammatory infiltrate corresponded to a mean increase in the GVE of 1.19 (0.52), whereas a marked increase of the chronic inflammatory infiltrate corresponded to a mean increase of 7.31 (0.50).
After backward selection with bootstrap validation, three items were selected (table 2): chronic inflammatory infiltrate, acute inflammatory cell infiltrate and ulceration.
From the results of this multivariate analysis, regression coefficients were rounded to the nearest integer in order to simplify the score. The two higher levels of chronic inflammatory infiltrate were rounded to 3 and were pooled in a single level.
For each value of predicted severity (GVE range 0–9), the levels of the three items composing the simplified index are presented in table 3. The predicted GVE is ordinal, containing eight ranks ranging from 0 to 9. Some values of GVE are not possible with this index. For example, a GVE of 2 cannot be observed, because a mild acute inflammatory cells infiltrate cannot be observed without a chronic inflammatory infiltrate, or ulceration cannot be observed without a chronic inflammatory infiltrate and acute inflammatory cells infiltrate. A final simplification based on clinical relevance was performed by pooling some levels to achieve a range 0–4 (table 3).
The correlation coefficient between GVE and the Nancy index was 0.961 (0.948 to 0.970).
Figure 2 summarises the Nancy index with five grades. The presence of ulceration on biopsy specimen corresponds to severely active disease (grade 4 according to the Nancy index). If there is no ulceration on biopsy specimens, acute inflammatory cells infiltrate (presence of neutrophils) is assessed: moderate or severe acute inflammatory cells infiltrate corresponds to grade 3 ‘moderately active disease’ while mild acute inflammatory cells infiltrate correspond to grade 2 ‘mildly active disease’. If there is no acute inflammatory cells infiltrate, assessment of chronic inflammatory infiltrate (lymphocytes and plasmacytes) is made: biopsy specimen showing moderate or marked chronic inflammatory infiltrate corresponds to grade 1 ‘men showing moderate or marked chronic acute inflammatory infiltrate’ and biopsy showing mild or no chronic inflammatory infiltrate corresponds to grade 0 ‘absence of significant histological disease’.
Phase 2: Validation of the Nancy index
There were no missing data for the GVE, the Geboes index and the Nancy index.
Reliability
For 100 randomly selected biopsies, the value of ICC for the intrareader reliability was 0.880 (0.816 to 0.924). The intrareader reliability for individual items and for the Nancy index is reported in table 4.
The three items constituting the Nancy index and the GVE were then scored by three pathologists from three different centres on 100 biopsy specimens from 100 patients with UC. There were no missing data. Inter-reader reliability is summarised in table 4. The Nancy index showed very good interobserver reliability (ICC=0.865 (0.813 to 0.998)). The greatest reliability (almost perfect or very good) was obtained for the item ‘ulceration’ (ICC=0.865 (0.750 to 1)). Good inter-rater reliability was obtained for ‘chronic inflammatory cells’ (ICC=0.750 (0.643 to 1)) and ‘acute inflammatory cells’ (ICC=0.772 (0.704 to 0.940)).
After checking the inter-reader reliability for GVE (ICC=0.839 (0.770 to 0.888)) and for the Geboes score (ICC=0.847 (0.781 to 0.894)), the GVE and the Geboes scores of the most experienced reader (reader 3) were selected.
The Cronbach's coefficient α was 0.861 for the Nancy index for reader 3. One-at-a-time deletion of items indicated that each item contributed positively to the Nancy index (ulceration, 0.886; acute inflammatory infiltrate, 0.775; chronic inflammatory infiltrate: 0.879).
The construct validity with the hypothesis testing was investigated with the correlation between the GVE of reader 3 and the Nancy index scored by each reader. The correlation was very strong for each reader (correlation coefficient=0.876 (0.819 to 0.914) for reader 1, 0.819 (0.739 to 0.873) for reader 2 and 0.874 (0.816 to 0.913) for reader 3).
The correlation between the Geboes score of reader 3 and the Nancy index was also very strong for each reader (correlation coefficient=0.899 (0.853 to 0.931) for reader 1, 0.843 (0.773 to 0.891) for reader 2 and 0.939 (0.909 to 0.958) for reader 3).
Responsiveness
Two sets of 30 biopsy specimens taken from 30 patients with UC at two different time points were reviewed. Median of time between two biopsies was 451 days (range to 41–1169 days). There were no missing data for GVE, Geboes index and the Nancy index. All patients had at least a difference of 1 point according to the Geboes conversion 9 between the two biopsies. The mean change (SD) was −2.53 (1.10) for the Nancy Index, −4.83 (2.13) for the GVE and −15.86 (7.26) for the Geboes index. Pearson's correlation coefficient between changes in the Nancy index and changes in GVE was equal to 0.886 (0.766 to 0.943). Pearson's correlation coefficient between changes in the Nancy index and changes in Geboes index was equal to 0.910 (0.813 to 0.955).
Discussion
In patients with UC histological healing has emerged as a major therapeutic goal.23 ,28 ,29 Acute inflammatory indicators are associated with a twofold to threefold increased risk of colitis relapse during 12 months follow-up5 and basal plasmacytosis predicts UC clinical relapse in patients with complete mucosal healing.6 However, before assessment of histological disease activity can be appraised and ultimately accepted as a useful metric in clinical research or clinical practice, a validated instrument to measure histological disease activity is needed.
We have developed the first validated index for the assessment of histological disease activity in UC, namely the Nancy histological index. The Geboes index is most commonly used for the assessment of histological disease in UC.8 ,22 Our aim was to develop an index including the items explaining most of the disease activity. This is the methodology that has been used so far to develop all validated indexes now considered as gold standard in IBD trials, such as patient-reported outcomes-2 (PRO2) and ulcerative colitis endoscopic index of severity (UCEIS). The main advantages of the Nancy index are its simplicity and practicality.
The Nancy index is composed of three histological items defining five grades of disease activity: absence of significant histological disease (grade 0), chronic inflammatory infiltrate with no acute inflammatory infiltrate (grade 1), mildly active disease (grade 2), moderately active disease (grade 3) and severely active disease (grade 4) (figure 2). Grade 0 corresponds to the absence of significant histological disease. Grade 1 corresponds to the lack of mucosal neutrophils, a pivotal marker of disease activity, even though moderate or severe chronic inflammation can be present. The last three grades correspond to active disease as defined in the literature.14 ,15 Two of the three histological items showed the best intraobserver and interobserver reliability among the items composing the four widely used histological UC activity indices.20 ‘Ulceration’ showed the best inter-reader reliability (ICC=0.86, almost perfect). Assessment of ulceration is made by the presence or absence of ulceration,15 in contrast to grading ulceration which was used by Geboes et al.13 Acute inflammatory cell infiltrate, as defined by Riley et al,5 showed good reliability (ICC=0.77) although the quantification by Riley into four grades appears unnecessary. Our study found that three grades (absent, mild, or moderate to severe) were sufficient. Defining acute inflammation by the location of neutrophils,13 complicates disease activity assessment. The third item ‘chronic inflammatory infiltrate’, as defined by Geboes, includes assessment plasmacytes. Although chronic inflammation is by definition not a histological criterion of disease activity, Bessissow et al6 showed that the presence of basal plasmacytosis (p=0.007) predicted disease relapse. Mosli et al29 demonstrated that the greatest intrareader and inter-reader ICC was obtained for the detection of chronic inflammatory infiltrate defined by Geboes, while the lowest ICC was observed for basal plasmacytosis. In the development of the current index we demonstrated that scoring basal plasmacytosis did not improve the sensitivity of the index.
The main advantages of the Nancy index are its very good intraobserver and interobserver reliability, its simplicity and ease of use. It was tested by different pathologists in three centres. The three histological criteria are easy to assess for a pathologist, with the technical advantage that good orientation of the biopsy specimen is not essential for the assessment of the criteria.
Finally, we have shown good results for detecting change. Forthcoming studies will define the predictive value of this validated system scoring in assessing outcomes in UC and correlating it with endoscopic scoring. A validated index for the assessment of histological activity is now available for use in clinical practice and clinical trials.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Footnotes
JS and CB-R contributed equally.
Contributors AM-B, research, analysis and interpretation of data, drafting of the manuscript; CB, VC, CB-R research; JS, analysis and interpretation of data, drafting of the manuscript; M-DD and GC, data collection; SD, WR, SS, interpretation of data and critical review of the manuscript for intellectual content; ST, interpretation of data and drafting of the manuscript; LP-B, conception and design of study, analysis of data, drafting of the manuscript, study supervision.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement We agree to share data.