Article Text
Abstract
Background Recent progress in high quality sequencing of circulating nucleic acids makes liquid biopsy an efficient approach to monitor tumor evolution and therapy response. Cell-free RNA (cfRNA) from blood and other biofluids contains a tumor-derived fraction,1 offering a minimally invasive tool to characterize tumor-related transcriptomic states. Here, we present a comprehensive machine learning (ML)-driven platform for analysis of blood-derived cfRNA to infer clinically important features and biomarkers of malignancies.
Methods cfRNA was extracted from 4 mL of double-spun plasma (n = 232 healthy and n = 92 breast, 36 lung, 23 pancreatic, and 17 colorectal cancer cases). NGS libraries were prepared according to the Agilent XT HS2 protocol using the V8+UTR exome-wide panel. Pisces 5.2 and samtools mpileup tools were used to call tumor-specific mutations from cfRNA. Abundance of transcripts from cancer-specific signatures was analyzed using gene set enrichment analysis (GSEA) and single-sample GSEA. ML decision tree-based models were trained on artificial data generated from open source bulk RNA-seq data from cancer cells, tissues, and sorted cells collected across the GEO database. Model testing was performed on real cfRNA sequences (n = 232 healthy, n = 168 cancer cases).
Results We developed robust protocols for plasma-derived cfRNA extraction and NGS library preparation for reproducible interpatient and intrapatient cfRNA transcriptome profiling (figure 1). cfRNA profiles from cancer patients contained mRNA transcripts carrying tumor-specific hotspot mutations demonstrating a tendency to moderate positive correlation between tumor and cfRNA variant allele frequencies (VAFs; R = 0.41, p = 0.064; figure 2), and profiles were also enriched with epithelial, epithelial-mesenchymal transition, senescence, and angiogenesis signatures (figure 3). We employed an ML-driven approach to infer tumor-specific characteristics from the tumor-derived cfRNA fraction for breast, colorectal, lung, and pancreatic cancers. ML models trained with artificial cfRNA transcriptomes accurately detected the status of breast cancer (AUC = 0.73 ± 0.05, n = 153), tumor microenvironment fibrosis (AUC = 0.80 ± 0.07, n = 44), predicted PD-1 (AUC = 0.71 ± 0.03, n = 78), and liver metastasis (AUC = 0.70 ± 0.02, n = 143) when tested in clinical patient samples (figure 4).
Conclusions The presented cfRNA-based platform offers unprecedented insight into the tumor biology compared to liquid biopsy assays used in current clinical practice. The proposed platform is universal and can potentially characterize any tumor-associated process accompanied by transcriptomic changes reflected in the cfRNA fraction.
Reference
Larson MH, Pan W, Kim HJ, et al. A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection. Nat Commun 2021;12:2357.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.