Article Text

Download PDFPDF

1296 Reconstructing gene expression from clinical and genetic panel data for predictions of tumor microenvironment features and response to immune checkpoint inhibitor therapy
  1. Felicia Kuperwaser1,
  2. Sunil Kumar2,
  3. Dillon Tracy3,
  4. Jeff Sherman3,
  5. Andrey Chursov3,
  6. Maayan Baron3 and
  7. Emily Vucic3
  1. 1Zephyr AI, New York, NY, USA
  2. 2ZephyrAI.Bio, Pittsburgh, PA, USA
  3. 3Zephyr AI, McLean, VA, USA
  • Journal for ImmunoTherapy of Cancer (JITC) preprint. The copyright holder for this preprint are the authors/funders, who have granted JITC permission to display the preprint. All rights reserved. No reuse allowed without permission.


Background The development of immune checkpoint inhibitor (ICI) therapy has fundamentally changed the landscape of cancer treatment. While ICIs have exhibited remarkable efficacy across diverse cancer types, the majority of cancer patients do not respond to these therapies.1 Tools to better identify patients who would benefit from ICI therapy are urgently needed to facilitate personalized care. Models for ICI response that incorporate tumor microenvironment (TME) features in addition to molecular data have demonstrated improved predictive power of patient response to therapy.2 3 These features reflect the coordinated activity of multiple cell types and therefore, are best captured by mRNA expression. Transcriptional profiles are not however readily assayed in clinical settings. Extracting TME features from molecular data already collected in clinical settings provides an opportunity to bridge the gap between predictive models that rely on these features and their translation into clinical practice.

Methods We developed an ML model to reconstruct tumor gene expression profiles using genetic information from clinically available commercial NGS panels and embeddings4 generated by a language model (figure 1). This model was trained on publicly available data including ~8000 tumors representing 32 cancer types5 and validated in additional heterogeneous cohorts.

Results Gene expression reconstruction using this model was highly correlated with true expression (mean correlation per sample = 0.88, [0.8818 - 0.8858, 95% CI, N=847]). We applied these data to the prediction of a set of TME signatures, previously associated with response to ICI therapy6 7 and which describe TME composition and phenotype (mean correlation per sample = 0.81, [0.8008, 0.8170, 95% CI, N=756]). We demonstrate how reconstructed TME signatures were predictive of survival and provide interpretable biological insight into differences in patient outcomes across these cohorts.

Conclusions Our flexible analytic framework for reconstructing gene expression profiles from clinicogenomics data allows for integration of additional features and enables prediction of cancer type- and subtype-specific features across diverse patient cohorts. This approach also has the potential to expand the number of patients who may otherwise have been overlooked for these therapies, ultimately providing more precise and effective individualized treatment options with the potential for improved outcomes for more cancer patients.

Acknowledgements The authors would like to acknowledge members of the Zephyr AI science, engineering and data teams.


  1. Shiravand Y, Khodadadi F, Kashani SMA, Hosseini-Fard SR, Hosseini S, Sadeghirad H, Ladwa R, O’Byrne K, Kulasinghe A. Immune Checkpoint Inhibitors in Cancer Therapy. Curr Oncol. 2022 Apr 24;29(5):3044–60.

  2. Riera-Domingo C, Audigé A, Granja S, Cheng WC, Ho PC, Baltazar F, Stockmann C, Mazzone M. Immunity, Hypoxia, and Metabolism-the Ménage à Trois of Cancer: Implications for Immunotherapy. Physiol Rev. 2020 Jan 1;100(1):1–102.

  3. Yang S, Wu Y, Deng Y, Zhou L, Yang P, Zheng Y, Zhang D, Zhai Z, Li N, Hao Q, Song D, Kang H, Dai Z. Identification of a prognostic immune signature for cervical cancer to predict survival and response to immune checkpoint inhibitors. Oncoimmunology. 2019 Oct 3;8(12):e1659094.

  4. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020 Feb 15;36(4):1234–40.

  5. The Cancer Genome Atlas Program (TCGA) []

  6. Jerby-Arnon L, Shah P, Cuoco MS, Rodman C, Su MJ, Melms JC, Leeson R, Kanodia A, Mei S, Lin JR, Wang S, Rabasha B, Liu D, Zhang G, Margolais C, Ashenberg O, Ott PA, Buchbinder EI, Haq R, Hodi FS, Boland GM, Sullivan RJ, Frederick DT, Miao B, Moll T, Flaherty KT, Herlyn M, Jenkins RW, Thummalapalli R, Kowalczyk MS, Cañadas I, Schilling B, Cartwright ANR, Luoma AM, Malu S, Hwu P, Bernatchez C, Forget MA, Barbie DA, Shalek AK, Tirosh I, Sorger PK, Wucherpfennig K, Van Allen EM, Schadendorf D, Johnson BE, Rotem A, Rozenblatt-Rosen O, Garraway LA, Yoon CH, Izar B, Regev A. A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade. Cell. 2018 Nov 1;175(4):984–97.e24.

  7. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA, Ziv E, Culhane AC, Paull EO, Sivakumar IKA, Gentles AJ, Malhotra R, Farshidfar F, Colaprico A, Parker JS, Mose LE, Vo NS, Liu J, Liu Y, Rader J, Dhankani V, Reynolds SM, Bowlby R, Califano A, Cherniack AD, Anastassiou D, Bedognetti D, Mokrab Y, Newman AM, Rao A, Chen K, Krasnitz A, Hu H, Malta TM, Noushmehr H, Pedamallu CS, Bullman S, Ojesina AI, Lamb A, Zhou W, Shen H, Choueiri TK, Weinstein JN, Guinney J, Saltz J, Holt RA, Rabkin CS, Cancer Genome Atlas Research Network, Lazar AJ, Serody JS, Demicco EG, Disis ML, Vincent BG, Shmulevich I. The Immune Landscape of Cancer. Immunity. 2018 Apr 17;48(4):812–30.e14.

Abstract 1296 Figure 1

Gene expression reconstruction from real world data: Clinical features and genetic panels are used to reconstruct expression of a selected set of ~450 genes and 32 tumor microenvironment (TME) signatures predictive of response to immune checkpoint inhibitor (ICI) therapy

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.