Article Text
Abstract
Background Standard, manual, single-pathologist histopathology assessment in clinical trials makes accurate quantification of biomarkers expressed by both tumor and immune cells challenging due to inter-pathologist variability and non-exhaustive cell evaluation. Artificial intelligence (AI) models offer quantitative, reproducible solutions in optimized patient characterization for precision therapeutics but typically require large sample sizes for development. Here, we investigated whether a digital image analysis pipeline could be developed to analyze non-small cell lung cancer (NSCLC) from a first-in-human (FIH) Phase I clinical trial with limited samples, variable tissue content, and pre-analytical parameters addressing the outlined challenges for (1) H&E-based tumor microenvironment (TME) characterization and (2) scoring aryl hydrocarbon receptor (AhR) expression in serial immunohistochemistry (IHC)-stained sections.
Methods The study used 25 H&E and 24 IHC (AhR) NSCLC biopsies (NCT04069026, following standard ethical guidelines). Pre-trained NSCLC models for H&E-based tissue segmentation (carcinoma, stroma, necrosis, ‘other’) and cell detection and classification (carcinoma vs ‘other’ cells) were optimized on 20 and evaluated on five hold-out slides. AI models were developed for IHC-based cell detection and classification (carcinoma vs ‘other’) and nuclear AhR expression prediction (negative, weak, moderate, strong). Pathologists provided training annotations using detailed guidelines. Evaluation annotations in pre-defined regions were withheld from training. AhR scores and spatial statistics were derived via automated TME analysis. Pathologists were assessed for inter-rater variability in IHC-based cell classification and AhR scoring tasks.
Results The H&E tissue segmentation and cell classification models demonstrated mean F1 scores of 0.92 and 0.83, respectively. The IHC-based cell classification model achieved an F1 score of 0.90. The AhR expression model exhibited an F1 score of 0.93 for ‘AhR present vs not present’ and a mean F1 score of 0.80 across all intensity categories. Inter-rater agreement among three pathologists was moderate for carcinoma classification (Krippendorff’s alpha [KA] 0.77). Moderate agreement for AhR nucleus negativity (KA 0.77) and weaker agreements for the other categories were observed (KA weak expression 0.42, moderate expression 0.32, strong expression 0.69).
Conclusions Inter-rater variability results underscore the range of individual interpretation of AhR expression, emphasizing the need for a precise, automated methodology. Systematic model development and data curation enabled fast, reproducible, and exhaustive analysis, despite the limited biopsies common in FIH studies. This data-efficient, tissue-sparing approach allows rapid profiling and quantification of AhR-expressing cell populations in clinical samples, potentially increasing AhR inhibitor development efficiency.
Acknowledgements Editing support was provided by Rachel Fairbanks, BA (Hons), Complete HealthVizion, IPG Health Medical Communications, and was funded by Bayer AG
Ethics Approval This study was approved by all participating institutions’ ethics boards; approval numbers available on demand
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.