Article Text

Original research
Identification of pan-cancer/testis genes and validation of therapeutic targeting in triple-negative breast cancer: Lin28a-based and Siglece-based vaccination induces antitumor immunity and inhibits metastasis
  1. Jason A Carter1,2,3,
  2. Bharati Matta4,
  3. Jenna Battaglia4,
  4. Carter Somerville4,
  5. Benjamin D Harris1,5,
  6. Margaret LaPan4,
  7. Gurinder S Atwal1,6 and
  8. Betsy J Barnes4,7
  1. 1Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
  2. 2Stony Brook University, Stony Brook, New York, USA
  3. 3Department of Surgery, University of Washington, Seattle, WA, USA
  4. 4Northwell Health Feinstein Institutes for Medical Research, Manhasset, New York, USA
  5. 5Lyell Immunopharma, South San Francisco, CA, USA
  6. 6Regeneron Pharmaceuticals Inc, Tarrytown, NY, USA
  7. 7Departments of Pediatrics and Molecular Medicine, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
  1. Correspondence to Dr Betsy J Barnes; bbarnes1{at}northwell.edu

Abstract

Background Cancer–testis (CT) genes are targets for tumor antigen-specific immunotherapy given that their expression is normally restricted to the immune-privileged testis in healthy individuals with aberrant expression in tumor tissues. While they represent targetable germ tissue antigens and play important functional roles in tumorigenesis, there is currently no standardized approach for identifying clinically relevant CT genes. Optimized algorithms and validated methods for accurate prediction of reliable CT antigens (CTAs) with high immunogenicity are also lacking.

Methods Sequencing data from the Genotype-Tissue Expression (GTEx) and The Genomic Data Commons (GDC) databases was used for the development of a bioinformatic pipeline to identify CT exclusive genes. A CT germness score was calculated based on the number of CT genes expressed within a tumor type and their degree of expression. The impact of tumor germness on clinical outcome was evaluated using healthy GTEx and GDC tumor samples. We then used a triple-negative breast cancer mouse model to develop and test an algorithm that predicts epitope immunogenicity based on the identification of germline sequences with strong major histocompatibility complex class I (MHCI) and MHCII binding affinities. Germline sequences for CT genes were synthesized as long synthetic peptide vaccines and tested in the 4T1 triple-negative model of invasive breast cancer with Poly(I:C) adjuvant. Vaccine immunogenicity was determined by flow cytometric analysis of in vitro and in vivo T-cell responses. Primary tumor growth and lung metastasis was evaluated by histopathology, flow cytometry and colony formation assay.

Results We developed a new bioinformatic pipeline to reliably identify CT exclusive genes as immunogenic targets for immunotherapy. We identified CT genes that are exclusively expressed within the testis, lack detectable thymic expression, and are significantly expressed in multiple tumor types. High tumor germness correlated with tumor progression but not with tumor mutation burden, supporting CTAs as appealing targets in low mutation burden tumors. Importantly, tumor germness also correlated with markers of antitumor immunity. Vaccination of 4T1 tumor-bearing mice with Siglece and Lin28a antigens resulted in increased T-cell antitumor immunity and reduced primary tumor growth and lung metastases.

Conclusion Our results present a novel strategy for the identification of highly immunogenic CTAs for the development of targeted vaccines that induce antitumor immunity and inhibit metastasis.

  • antigens, neoplasm
  • breast neoplasms
  • CD8-Positive T-Lymphocytes
  • computational biology
  • immunogenicity, vaccine

Data availability statement

Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as online supplemental information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • The CTdatabase provides a repository of over 200 known CT genes, some of which have been tested in phase III trials with limited success. The advent of large public sequencing repositories of healthy tissues and cancer types enables the development of standardized methods and criteria to identify immunogenic CT genes for targeted vaccine therapies.

WHAT THIS STUDY ADDS

  • This study implemented a new bioinformatic pipeline that utilizes an unbiased, genome-wide screen of public domain sequencing data to identify and subsequently validate in vivo immunogenic targets for immunotherapy.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Findings from this study provide a rationale for further investigation of therapeutic antigen-based vaccines that target immunogenic tumor-specific CT genes for the treatment of breast cancer.

Introduction

Immunotherapy relies on the ability of the adaptive immune system to differentiate neoplastic cells from healthy tissues, most commonly via the recognition of tumor neoantigens arising from somatic mutations.1 2 Personalized cancer vaccines enable targeting of specific tumor neoantigens3 4 and may therefore work synergistically with immune checkpoint blockade (ICB) therapies,5 with vaccination against immunogenic mutations in melanoma6–8 and glioblastoma9–11 patients showing early clinical promise. However, random somatic mutations, aptly referred to as private neoantigens, are rarely shared between individuals and therefore have limited generalizability across patients.12 Public neoantigens shared across many tumors, in contrast, have off-the-shelf therapeutic potential, but have been largely restricted to rare cancer driver mutations.13 Cancer–testis (CT) genes represent a distinct alternative source of cancer-related antigens potentially amenable to a broad class of immunotherapies, including personalized vaccines and cell therapy.

CT genes, defined by their expression restricted to immune-privileged testis and ectopic expression in cancer, contain germline sequences that are immunogenic when ectopically expressed by neoplastic cells.14–16 While the aberrant expression of CT genes has historically thought to arise from epigenetic dysregulation,17 18 recent work has increasingly demonstrated crucial roles for CT gene expression in oncogenesis,19 metastatic disease progression20–22 and potentially ICB response.23 CT gene upregulation has been observed across numerous tumor types,24 including those with relatively low tumor mutational burdens such as hormone receptor-negative tumors,25–27 making them a potentially widely applicable target for new immunotherapeutics. Despite the promising results of CT antigen (CTA)-based immunotherapy in many cancers such as melanoma, glioblastoma, and ovarian tumors (NCT02166905; NCT00071981; NCT01213472; NCT02650986; NCT01967823), such vaccination trials in patients with breast cancer are still limited or get withdrawn due to null findings (NCT03110445; NCT01522820; NCT03093350; NCT02015416).26

The CTdatabase provides a repository of over 200 known CT genes compiled from the literature.28 However, the methods used in CT gene discovery and even the criteria used to define CT gene expression thresholds have not been standardized.29–31 The development of the Genotype-Tissue Expression (GTEx)32 and the Genomic Data Commons (GDC), including The Cancer Genome Atlas (TCGA)33 and Therapeutically Applicable Research to Generate Effective Treatments (TARGET),33 34 public sequencing repositories for various healthy tissues and cancer types, respectively, have more recently enabled high-throughput screening for CT expression patterns.35 36 These approaches have not focused on the identification of immunogenic CT genes for targeted vaccine therapies, however, which requires a high degree of testis exclusivity, a lack of central tolerance (ie, absent thymic expression), and robust cancer expression that ideally represents function as an oncogenic driver.

We, here, implemented a new bioinformatic pipeline to perform an unbiased, genome-wide screen aimed at identifying genes with robust CT expression that are more likely to represent immunogenic targets for immunotherapy. We identified 103 CT genes, 70 (68%) of which are not present in the CTdatabase, that meet a highly conservate exclusivity expression threshold, lack detectable thymic expression, and are significantly expressed in multiple tumor types. We further developed and validated the use of an ad hoc algorithm that incorporates both major histocompatibility complex class I (MHCI) and MHCII antigen-binding strength predictions37 to optimize the selection of immunogenic CT gene germline sequences for synthetic long peptide vaccination.38 Two test CT genes—Siglece and Lin28a—were identified for validation based on their testis-exclusive expression in humans and their differential expression in 4T1 mouse primary mammary tumors and lung metastases. CTA vaccination against Siglece or Lin28a resulted in a robust CD4+ T-cell helper and CD8+ T-cell cytolytic response, with a significant reduction in primary tumor growth and lung metastases in a mouse model of triple-negative breast cancer (TNBC).

Methods

Identification of testis-exclusive genes

Gene-level RNA-seq data was downloaded directly from V.8 of the GTEx project.32 In all, we obtained RNA-sequencing data from 17 382 healthy samples representing 30 different tissue types with gene expression quantified as transcripts per million (TPM). We then used the Ensembl39 annotation to identify those genes that were protein coding, removing those without protein IDs from downstream analysis. To differentiate tissue samples that actively expressed each gene from transcriptional and technical noise, we chose a threshold of 2 TPM.40 Using this conservative threshold, we then calculated the Phi Correlation Coefficient41 (ϕ) for each protein-coding gene with respect to all 30 GTEx healthy tissue types. Specifically, Embedded Image for the ith tissue type and a given gene is calculated as:

Embedded Image

where a is the number of ith tissue samples expressing that gene (at a threshold of ≥2 TPM), b is the number of samples with gene expression across all other tissue types, c is the number of ith tissue samples lacking expression of that gene, and d is the number of all other tissue samples without gene expression. A ϕ value of 0.95 corresponds to approximately 2.5% of all samples having non-exclusive expression (ie, Embedded Image). Gene ontology analysis was conducted using the PANTHER (Protein ANalysis THrough Evolutionary Relationships) classification system.42

Identification of genes without thymic expression

RNA-sequencing data from human medullary thymic epithelial cell (mTEC) samples was obtained under accession codes GSE20172043 and GSE127209.44 We included both MHCII low (immature) and high (mature) mTEC samples from both previously published datasets. Transcript abundance quantification was performed using Kallisto45 and a threshold of 2 TPM was again used as a threshold for thymic expression. Testis-exclusive genes with expression in any thymic sample were excluded from further analysis.

Identification of cancer/testis genes

We next examined the expression of these protein-coding, testis-exclusive genes without thymic expression across 13,237 tumor samples representing 38 cancer types. Tumor-sequencing data and clinical data were obtained from the GDC,46 including both TCGA and the TARGET initiative.33 34 Cancer types with sequencing data from at least 50 samples per cancer type were included. Using the expression threshold of 2 TPM, we calculated the fraction of tumors in each cancer type expressing each gene. We defined cancer/testis gene expression as those testis-exclusive genes without thymic expression that were expressed in at least 5% of tumors across two different tumor types. Given their high expression of testis-exclusive genes, testicular germ cell tumors were not included in the determination of cancer/testis gene expression. Previously identified CT genes were obtained from CTdatabase.28

Evaluation of germness score

We defined a germness score as the average expression of our 103 CT genes, explicitly:

Embedded Image

where Gi is the expression (in TPM) of the ith CT gene and N is the total number of CT genes. We calculated the germness score for each GTEx healthy and GDC tumor sample. Clinical data was obtained from GDC46 and additional features of the tumor immune microenvironment for GDC samples were obtained from Thorsson et al.47 Kaplan-Meier curves were generated using the Python lifelines package.48

Mice

Wild-type Balb/c mice were purchased from The Jackson Laboratory (Bar Harbor, Maine, USA). Due to the nature of the study, only female mice aged 8–12 weeks were used. Mice were kept in environment-controlled pathogen-free conditions with a 12-hour light/dark cycle and an ambient temperature of 23°C±2°C. This study was carried out in strict accordance with recommendations in Guide for the Care and Use of Laboratory Animals of the NIH (National Research Council, Guide for the Care and Use of Laboratory Animals (National Academies Press, Washington, DC, ed. 8, 2011).

Selection of CTA sequences for vaccination

We sequenced six replicates from healthy thymus, mammary and lung mouse tissue samples, as well as thymic samples from tumor-bearing mice, primary mammary tumors and lung metastases. Abundance quantification and differential expression analyses were performed using Kallisto45 and Sleuth,49 respectively. After examining the expression of mouse homologs of human protein-coding, testis-exclusive genes without thymic expression across each of these 36 healthy and tumor-bearing mouse samples, we identified Lin28a and Siglece as potential CT vaccine targets due to expression limited to primary mammary tumors and/or lung metastases.

The entire germline sequences of both Lin28a and Siglece were divided into all possible 25–30 amino acid substrings and the number of peptides predicted to bind to H-2-Dd, H-2-Kd, H-2-Ld, H-2-IAd and H-2-IEd was calculated for each substring using NetMHCpan.37 Each sequence was then scored according to the number of strong (top 0.5% and 1% of predicted MHCI and MHCII binding) and weak (top 2% and 5%, respectively) binders, with weak binders weighted as ¼ of strong binders. The two top scoring distinct peptide sequences were selected for both Siglece (S) and Lin28a (L). Amino acid sequences chosen for synthesis (including transcripts containing each sequence) were: S1—KKDAGLYFFRLERGKTKYNYMWDKMTLVV (XP_006541381.1, XP_006541382.1, XP_006541383.1, XP_006541384.1), S2—TRMTIRLNVSYAPKNLTVTIYQGADSVSTI (XP_006541381.1, XP_006541382.1, XP_006541384.1), L1—KLPPQPKKCHFCQSINHMVASCPLKAQQGP (NP_665832.1, XP_011248663.1, XP_006539380.1), and L2—EAVEFTFKKSAKGLESIRVTGPGGVFCIGS (NP_665832.1). Peptides were synthesized by LifeTein and purity was confirmed by HPLC (High-performance liquid chromatography) and mass spectrometry.

In vivo tumor initiation and metastasis

To initiate primary tumor growth and lung metastasis, freshly prepared 1×105 4T1 cells in mid-log phase growth were orthotopically injected into the fourth-right mammary fat pad of Balb/c mice. Two vaccination protocols were assessed: (1) pretumor and (2) post-tumor cell inoculation. Tumor cell inoculation day was determined as day 0 on the dosing strategy chart. Peptides (S1, L1 or L2; 100 µg) were injected subcutaneously (SC) with 100 µg polyinosinic:polycytidylic acid Poly(I:C) (PIC) (Sigma cat# P9582-5MG) as an adjuvant in 100 µL phosphate-buffered saline (PBS). Mice were sacrificed on different days depending on the experiment, tumors weighed, and organs harvested for fixation and histopathological analysis, flow cytometry analysis of T-cell composition and T-cell proliferation assay. Lung metastases were quantified by counting the number of visible nodules on the surface of the lung. Counting was performed in a blinded manner by three independent readers.

Lung clonogenic assay

Mice from the prevention strategy were euthanized on day 28, and lungs were harvested from treated mice. Single-cell suspension was made, and 200 cells were plated in complete media with 60 µM 6-Thioguanine for 14 days. Non-adherent cells were washed out, adherent 4T1 clones were fixed with methanol for an hour, washed and stained with 0.5% crystal violet for 2 hours followed by washing with PBS. Images were taken on Evos M700 imaging system. Each dot represents a 4T1 colony, the dots were counted for quantitation.

Splenocyte proliferation assay

Spleens of vaccinated mice (S1, L1 or L2; 100 µg) were harvested 2 weeks after the last injection and single-cell suspension was made. The cells were stained with 5 µm CFSE (CarboxyFluoroscein Succinimidyl Ester, Invitrogen:C34554). Briefly, cells were incubated in the dark with the dye for 8–10 min. The cells were then washed two times with 10 mL ice-cold PBS+10% FBS (fetal bovine serum) to get rid of excess dye. The cells were then plated along with matched antigen (S1, L1 or L2; 1 µg/well) for 4 days. Similarly for controls, CFSE-stained splenocytes from naïve mice were also plated with all three peptides in culture. On day 5, cells were collected and stained with surface markers for T cells. The samples were run using BD Fortessa and analyzed on BD FlowJo and ModFit.

Flow cytometry

Thymi, spleens, tumors and lungs were collected, and single-cell suspensions were made. The cells were then treated with RBC lysis buffer, washed, and blocked in PBS supplemented with Fc Blocker (BioLegend #422302) for 15 min and then stained with antibodies against surface makers for 30 min. After staining, cells were washed two times in PBS without Mg++ or Ca2+ and then fixed in 4% PFA (paraformaldehyde) before analysis on a BD Fortessa. For intracellular staining, after fixation, cells were permeabilized in 0.1% Triton X-100 followed by staining with intracellular markers. Most of the flow cytometry antibodies were purchased from BioLegend (CD45-PerCP#103130, CD3-BV650#100229, CD4-BV421#100443, CD8-PerCP#100732, CD8-PE#140408, IFNγ-APC#505809, Granzyme B-FITC#515403, Siglec-E-APC#677105). Live/dead fixable stain Aqua #L34957 and Green #L23105 were obtained from Thermo Fisher Scientific. LIN28A-AF488 antibodies were from Cell Signaling Technology (#12573S).

Statistical analysis and code availability

Custom analysis code was written in Python (V.>3.8) or R (V.>4.03) for the identification of CT genes using GTEx and GDC data. All boxplots show median values with IQRs and extrema (whiskers at 1.5×IQR). Comparisons between paired samples were performed using a paired t-test; otherwise, statistical significance was assessed using a Mann-Whitney U test. Correction for multiple hypothesis testing was performed using Bonferonni correction. GraphPad Prism software was used for statistical analyses of the mouse vaccination studies. Quantitative data are presented as mean±SEM. Non-parametric data were analyzed by two-tailed Mann-Whitney U tests. All the data were compared between the different immunization groups by one-way ANOVA (Analysis of Variance) through Tukey’s post hoc tests. P<0.05 was considered statistically significant for all analyses.

Results

Identification of immunogenic CT genes

While the CTdatabase28 provides a resource of known genes with CT expression, many of these genes were identified in early studies utilizing only limited sample sizes. While more recent work has used large public sequencing repositories, there are no standardized criteria defining thresholds for CT expression, nor have these studies been focused on the high throughput identification of immunogenic CT genes. With this in mind, we first implemented a bioinformatic pipeline to identify robust CT gene expression from existing sequencing repositories spanning more than 30,000 healthy tissue and tumor samples (figure 1A). As our goal was to identify genes with the highest probability of serving as immunogenic targets for standardized ‘off-the-shelf’ immunotherapy, we (1) identified protein-coding genes with robust testis-exclusive expression, (2) removed those testis-exclusive genes that demonstrated any human thymic expression (and thereby are most likely to have central tolerance despite testis-exclusive peripheral expression), and (3) identified those, from this subset of testis-exclusive genes without detectable thymic expression, with meaningful expression in multiple tumor subtypes. Together, our high-throughput screening approach identified 103 high-confidence CT gene candidates, including 70 CT genes that are not present in the CTdatabase and are potentially public antigens.

Figure 1

Identification of high-confidence cancer–testis genes. (A) Schematic overview of our bioinformatic pipeline to identify cancer–testis genes. Briefly, we identified testis-exclusive genes among Genotype-Tissue Expression (GTEx) healthy tissues using the Phi Correlation Coefficient41 (Φ), excluded those with detectable thymic expression, and then identified a subset with robust tumor expression. (B) Distribution of gene exclusivity scores (Φ) by GTEx tissue type. Those genes with Φ≥0.95 are considered exclusive to that tissue type. (C) Number of tissue-exclusive genes found for each tissue type. (D) Gene ontology enrichment analysis42 50 of testis-exclusive genes. Pos Reg, positive regulation. (E) Thymic expression of testis-exclusive genes. Those testis-exclusive genes with detectable thymic expression (≥2 transcripts per million (TPM) in at least one thymic sample) were excluded. (F) The fraction of tumors expressing (≥2 TPM) a given testis-exclusive gene was calculated for each The Cancer Genome Atlas (TCGA) tumor type. Cancer–testis (CT) genes were defined as those testis-exclusive genes without thymic expression that were expressed in at least 5% of tumors from at least two different tumor types.

Specifically, as CT genes are strictly defined by an expression pattern limited to healthy testis tissue with undetectable expression in all other tissue types, we first examined gene expression in 17,000 healthy samples from 30 tissue types in the GTEx database.32 Using a threshold of 2 TPM to differentiate active gene expression from transcriptional noise,40 we calculated the Phi Correlation Coefficient41 (ϕ) for each protein-coding gene in each tissue type (figure 1A). To identify gene expression restricted to a single tissue type with a high degree of confidence, we considered only genes with a ϕ greater than or equal to 0.95 in a particular tissue type to be tissue-exclusive (figure 1B). Using this framework, we identified 440 tissue-exclusive protein-coding genes, of which 408 (93%) were exclusively expressed in the testis (figure 1C and online supplemental tables 1 and 2). Gene ontology analysis42 50 of these testis-exclusive genes confirmed a range of testis-related functions including positive regulation of fertilization, gamete DNA methylation, and male meiosis I (figure 1D).

Supplemental material

As the testis is immune privileged, testis-exclusive genes do not have the same central tolerance requirement as genes expressed in other peripheral tissues.51 52 Furthermore, as thymic expression and central tolerance of a gene precludes immunity against those antigens,53 we next directly examined the thymic expression of these testis-exclusive genes to further maximize the chance of selecting genes with immunogenic germline sequences. We identified 89 of our 408 (22%) testis-exclusive genes that did have detectable thymic expression across healthy human thymic samples,43 44 leaving 319 protein-coding testis-exclusive genes with undetectable thymic expression and thereby a lower chance of being recognized as a normal self-peptide (figure 1E and online supplemental figure 1). Of note, we found that only 64 out of 221 (29%) CTdatabase genes met our threshold for testis-exclusive expression, with an additional 8 of these 64 (12.5%) genes having detectable thymic expression (online supplemental figure 2). Together, only 56 out of 221 (25%) CTdatabase genes met our conservative thresholds for CT gene expression and these known CT genes compose only 18% of our final 319 testis-exclusive genes without thymic expression (online supplemental table 2).

We further examined the expression of this set of testis-exclusive protein-coding genes lacking thymic expression across more than 13,000 individual tumor samples spanning 38 cancer types from the GDC, including data from both TCGA and the TARGET initiative. As we were interested in the identification of potential CTs that could represent widely conserved antigens, we screened our above set of testis-exclusive genes without thymic expression to identify CT genes that were expressed in at least 5% of tumors (again using an expression threshold of 2 TPM) in at least 2 different cancer types (figure 1F and online supplemental figure 3). Using this conservative definition, we identified 103 CT genes of which only 33 (32%) are present in the CTdatabase (online supplemental table 2).

CT gene expression correlates with tumor stage and antitumor immunity

We next explored CT gene expression across GDC tumor types. To facilitate comparison, we defined a germness score as the log average expression of our set of 103 CT genes in each sample, with higher germness scores correlating with more robust CT gene expression. We first compared germness between GDC tumor samples and paired healthy samples taken from adjacent parenchyma in the same individual. We found higher germness in tumor samples in 15 (10 statistically significant) of 18 tumor types relative to paired normal samples (figure 2A). Only kidney chromophobe had significantly higher germness in adjacent healthy tissue. Further examination across tumor types and their corresponding GTEx healthy tissues demonstrated widespread upregulation of CT gene expression in this pan-cancer cohort (figure 2B).

Figure 2

Association of cancer–testis gene expression and tumor characteristics. (A) Germness was defined as the log average expression of the 103 cancer–testis (CT) genes identified in this study. We found that comparison of germness between paired Genomic Data Commons (GDC) tumor samples and tumor-adjacent healthy parenchymal samples in a majority of cancer types. *P≤0.05, **p≤0.01, ***p≤0.001 by paired t-test test after Bonferroni correction. (B) Germness was then calculated for each GDC tumor type (with at least 50 samples) and its corresponding Genotype-Tissue Expression (GTEx) healthy tissue type, again demonstrating significantly higher germness in tumor samples relative to their respective healthy parenchyma. (C) Cancer types with higher non-silent mutation rates (R=0.67, p=1.1×10−4 by Wald test) and (D) proliferation rates (R=0.61, p=5.4×10−4) tended to have higher germness scores. (E) Increased germness scores are correlated with more advanced tumor stage. *P≤0.05, ***p≤0.001 by Mann-Whitney U test. (F) In The Cancer Genome Atlas (TCGA) pan-cancer cohort, tumors with high germness (top 50%) had decreased overall survival rates relative to low germness tumors (p=1.1×10−9 by log rank test). (G) The Pearson correlation was calculated for tumor and (H) tumor-immune microenvironment features taken from Thorsson et al.47 Statistical significance (p≤0.05 by Wald test after Bonferonni correction) across all tumor types is denoted by an asterisk (*).

Further examining CT gene expression across various cancer types, we found positive correlations between median tumor germness and non-silent mutation rate (figure 2C, R=0.66, p=1.5×10−4 by Wald test), as well as between germness and tumor proliferation47 (figure 2D, R=0.64, p=2.2×10−4 by Wald test). Consistent with CT gene expression being higher in cancer types with higher mutation rates and proliferation (figure 2C,D), we found that germness tended to increase with tumor stage in this pan-cancer cohort (figure 2E), with germness significantly increasing between stages 1 and 2 (p=4.2×10−7 by Mann-Whitney U test) and between stages 3 and 4 of primary tumors (p=0.05 by Mann-Whitney U test). Further, consistent with a correlation between CT gene expression and the epithelial–mesenchymal transition,54 we found that germness was significantly higher in metastatic tumor samples when compared with stage 4 primary tumors (figure 2E, p=4.0×10−50 by Mann-Whitney U test). Similarly, as expected in more advanced disease states, overall survival was significantly higher in tumors with low germness scores (figure 2F, p=1.1×10−9 by log rank test).

We next calculated the Pearson correlation between germness and various tumor features within each tumor type. We found strong correlations across tumor types for tumor proliferation, recombination defects, tumor aneuploidy, and intratumor heterogeneity (figure 2G). While cancer types with higher rates of non-silent mutation tended to have higher germness scores, we did not identify a significant correlation between germness and non-silent mutation rate, single-nucleotide variant neoantigens, or Indel neoantigens at the individual tumor level. Similarly comparing features of the tumor immune microenvironment, we found that germness was positively correlated with features including leukocyte fraction, IFN-γ response, Th2 cells, undifferentiated M0 and proinflammatory M1 macrophages, and T-cell receptor Shannon entropy (figure 2H). Germness was additionally negatively correlated with the intratumoral presence of anti-inflammatory M2 macrophages, as well as memory CD4+, Th1, and Th17 T cells.

Bioinformatic development of an immunogenic CTA peptide vaccine

We have so far identified a set of high-confidence testis-exclusive genes that are likely to contain immunogenic germline antigens when ectopically expressed in tumors (online supplemental table 2). Further, we have shown that the expression of these genes in human cancers is correlated with more advanced tumor stages and often with markers of antitumor immunity. To begin to explore the potential of these CT genes as specific targets for immunotherapy, we focused on the identification of CTAs in breast cancer (online supplemental table 3). We chose to focus on breast cancer since it represents a mid-to-low mutational burden cancer type, in contrast to the high mutational burden tumor types for which current immunotherapies have shown the greatest level of success. Moreover, we directed our attention toward TNBCs (or basal-like tumors that lack ER (estrogen receptor), PR (progesterone receptor), and HER2 (human epidermal growth factor receptor 2) expression) as they represent 10%–20% of all breast cancers, are highly aggressive, exhibit metastases, lack targeted therapies, and have a poor prognosis. TNBCs are not responsive to approved hormone therapies and thus therapeutic options remain limited. Consistent with previous studies showing higher CT gene expression in hormone receptor-negative tumors,27 we found the highest germness scores for the basal-like breast cancer subtype among GDC samples55 56 (figure 3A). We therefore elected to use the 4T1 murine mammary cancer cell line as a model of TNBC.

Figure 3

Development of cancer–testis (CT) antigen vaccines for a mouse triple-negative breast cancer model. (A) Germness by PAM50 subtype56 across Genomic Data Commons (GDC) human breast cancer samples. The highest germness scores were observed for the basal subtype, which includes the majority of triple-negative tumors.82 *P<0.05, ***p<0.001 by Mann-Whitney U test. (B) Principal component (PC) analysis of thymic, mammary, and lung samples from wild type (light colors) and 4T1 tumor inoculated (dark colors) mice. (C) Volcano plots showing upregulated CT genes between thymus and primary tumor samples, (D) as well as thymus and metastatic lung tumor samples. Dotted line represents q=0.05. (E) Expression of Siglece (homolog of human testis-exclusive, thymic unexpressed gene SIGLECL1) is limited to primary murine mammary tumors and metastatic lung samples, (F) while Lin28a (homolog of LIN28A) is limited to metastatic lung tumors. (G) Germline sequences of both Siglece and Lin28a were scored according to their predicted major histocompatibility complex class I (MHCI) and MHCII binding affinities. Strong binders were defined as the top 0.5% and 1% of peptides for MHCI and MHCII, respectively, while weak binders represented the top 2% and 5%. Scores were calculated as the number of strong binding peptides and ¼ times the number of weak binding peptides contained within each sliding window. (H) We examined two of the top germline sequences by score as candidate CT antigen targets and synthesized these as synthetic long peptide vaccines. (I) Both CT antigen vaccines induced both CD4+ helper and (J) CD8+ cytotoxic T-cell proliferation in presensitized but not naïve control (c) mice. PBS, phosphate-buffered solution; PIC, Poly(I:C).

We first sequenced mammary, thymus, and lung tissue samples from healthy Balb/c mice in addition to thymic, primary mammary tumor, and metastatic lung tumor samples from mice orthotopically implanted with 4T1 mammary tumor cells into the fourth fat pad (figure 3B). We found significant upregulation of many of our human testis-exclusive genes in both mouse primary mammary and metastatic lung tumor samples relative to paired thymic samples (figure 3C,D and online supplemental tables 4 and 5). In order to identify mouse CT genes with the highest likelihood of containing immunogenic sequences, we searched for upregulated genes that were expressed in tumor samples but not in our healthy murine tissue samples, again using 2 TPM as our expression threshold (online supplemental figure 4 and online supplemental table 6). Using these criteria, we identified two murine CT gene targets—Siglece, an ortholog of SIGLECL1 with expression limited to both primary mammary and metastatic lung samples, and Lin28a, an ortholog of LIN28A with expression observed only in lung metastases (figure 3E,F). We next examined tissue-specific protein expression of Siglece and Lin28a in mouse thymic, primary mammary tumor, and metastatic lung tumor samples by flow cytometry analysis. As 4T1 cells do not express a tumor-specific marker per se, in tumors and lung, we gated on the CD45 population for analysis; in thymi, given that 97% of the cells are CD45+, we gated on all cells. Data demonstrate the select expression of Siglece within the primary tumor while both proteins were detected in lung metastases, and neither were detected in the thymus (online supplemental figures 5 and 6). These data demonstrate select membrane protein expression of Lin28a and Siglece, consistent with our RNA-seq data.

To identify sequences to test as cancer vaccines for these two CT genes, we screened all 25–30 amino acid long substrings contained within the germline sequence of both Siglece and Lin28a. As both MHCI and MHCII epitopes significantly contribute to the antitumor response,57 58 we identified all predicted MHCI and MHCII binding epitopes contained within each of these possible sequences using NetMHCpan.37 We then scored each germline substring by the number of predicted MHCI and MHCII epitopes, empirically weighting strong binding (top 0.5% and 1% of peptides for MHCI and MHCII, respectively) sequences more heavily than weak binding (top 2% and 5%, respectively) sequences (figure 3G,H and online supplemental tables 7–10). We selected the two highest scoring distinct germline sequences for both Siglece (S1, S2) and Lin28a (L1, L2) for synthesis as peptide vaccines, ensuring that these sequences contained both epitopes that would be presented by MHCI and MHCII with hopes of stimulating robust CD4+ helper T-cell and CD8+ cytotoxic T-cell responses (see Methods). However, as S2 was unable to be synthesized due to hydrophobicity and L1 failed to demonstrate efficacy in vivo (online supplemental figure 7), we focused subsequent analyses on the S1 and L2 peptide vaccines.

To confirm that these peptides can elicit T-cell proliferation in vitro, mice were inoculated with 100,000 4T1 cells on day 0 along with presensitizing SC with 100 µg of one of the CT peptides (S1, L2) and 100 µg of polyinosinic:polycytidylic acid (PIC) as an adjuvant on days −4, 2, and 7. Mice injected with only PIC or PBS, as well as naïve mice, were used as controls and all mice were euthanized on day 14 post inoculation. Splenocytes were subsequently labeled with carboxyfluorescein succinimidyl ester (CSFE) and plated with PBS, PIC, S1, and L2. On day 5 after plating, proliferation indices were calculated for CD4+ and CD8+ T cells (figure 3I,J and online supplemental figure 8). These results demonstrated that mice presensitized to both CT gene peptide sequences were able to induce robust proliferation of both CD4+ and CD8+ T cells when compared with PIC and PBS pretreated mice, as well as naïve mice. Together, these results confirm the ability of our bioinformatic pipeline to both identify CT genes containing germline sequences that are recognized as foreign neoantigens and are capable of robustly inducing both CD4+ and CD8+ T-cell responses.

In vivo antitumor immune response to CTA vaccination

Given that CTA vaccination was able to induce both CD4+ and CD8+ T-cell proliferation in vitro, we next explored the ability of the vaccines to slow primary tumor growth in vivo. Towards this, we investigated two different vaccination schedules, one therapeutic dosing strategy meant to mimic a treatment beginning post diagnosis and one preventative dosing strategy that begins with a prophylactic dose. Using the therapeutic dosing strategy, we injected 100,000 4T1 cells on day 0 followed by subcutaneous injection of one of our peptide vaccines with a PIC adjuvant on days 3, 10, and 17 (figure 4A). We measured the size of the primary mammary tumor following sacrifice on day 28, finding significantly smaller tumors in those mice treated with the S1, but not L2, peptide vaccine (figure 4B). Using the preventative dosing strategy with vaccination occurring on days −4, 2, 7, and 14, we again found significantly smaller primary tumors in mice treated with the S1 peptide (figure 4C,D). Mice treated with S1 using the preventative dosing strategy tended to have smaller primary tumors at day 28 when compared with therapeutic dosing (403±148 mg vs 604±272 mg (mean±SD), p=0.18 by Mann-Whitney U test). Of note, we did not expect primary mammary tumor growth to be altered by the L2 vaccine given the lack of Lin28a expression in primary tumors (figure 3E,F). These findings directly demonstrate the ability of S1 vaccination to slow in vivo tumor growth in an expression-dependent fashion.

Figure 4

Cancer–testis antigen (CTA) vaccination reduces primary tumor growth by inducing targeted 4T1 cell death. (A) Schematic demonstrating therapeutic treatment dosing strategy with vaccination starting on day 3 following 4T1 tumor inoculation. Additional CTA vaccine boosters were administered on days 10 and 17 prior to euthanization on day 28. (B) Primary mammary tumor weights following sacrifice on day 28 following treatment with Siglece (S) and Lin28a (L) CTA vaccines following the therapeutic dosing regimen. (C) Alternatively, the prevention vaccination regimen had initial CTA vaccination 4 days prior to tumor inoculation, with subsequent boosters provided on days 2, 7, and 14 prior to euthanization again on day 28. (D) Mice treated with S1, but not L2, CTA vaccine according to the preventative dosing strategy had significantly lower primary tumor size relative to Poly(I:C) (PIC) and phosphate-buffered solution (PBS) controls. (E) 4T1 cell death was significantly increased when cocultured with tumor infiltrates from mice treated with the S1 vaccine according to the preventative dosing strategy. No increase in cell death was observed with tumor infiltrates from mice treated with L2, consistent with Siglece but not Lin28a expression in primary mammary tumors. For all panels, *p<0.05, **p<0.01, ***p<0.001 by one-way ANOVA (Analysis of Variance) through Tukey’s post hoc tests, all other pairwise comparisons are not statistically significant.

To confirm that CTA vaccination slows primary tumor growth by increasing immune recognition and subsequent cytolysis of tumor cells, we next performed a functional cytotoxic assay using tumor infiltrates. We cocultured CSFE-labeled primary 4T1 cells for 24 hours with tumor cells isolated from primary mammary tumors treated with the prevention dosing strategy in a ratio of 5:1 of tumor cells to 4T1 cells. We then used flow cytometry to quantify the percentage of cell death among the population of CSFE-labeled primary 4T1 cells, adjusting for the basal level of cell death of 4T1 cells cultured alone. We found significantly elevated rates of 4T1 cell death when exposed to tumor cells of mice treated with S1 (figure 4E and online supplemental figure 9). As expected, given the lack of Lin28a expression in primary tumors, no increase in primary 4T1 cell death was observed when exposed to infiltrates treated with the Lin28a vaccine (figure 4E).

Given the cytotoxic abilities of S1-treated tumor infiltrates, we next explored T-cell migration and function in the tumor immune microenvironment on day 28, following the preventative dosing schedule. We first used flow cytometry to quantify the percentage of cytotoxic CD8+ T-cells within tumor cell isolates relative to the total number of CD45+ leukocytes. While S1 vaccination tended to have higher relative fractions of cytotoxic T cells, this effect was not significant. Interestingly, L2 vaccination showed a significant increase in the proportion of CD8+ T cells relative to PIC and PBS treated controls (figure 5A). We additionally found that vaccination with either S1 or L2 increased the frequency of CD4+ T cells among all CD45+ leukocytes relative to PBS alone (figure 5B).

Figure 5

Cancer–testis antigen vaccination stimulates a robust CD4+ and CD8+ T-cell antitumor immune response. (A) Tumors treated with the L2 vaccine according to the prevention dosing regimen had a significantly increased proportion of CD8+ cytotoxic T-cells among tumor infiltrating CD45+ lymphocytes. (B) Similarly, a higher proportion of tumor infiltrating CD4+ helper T-cells relative to all CD45+ lymphocytes was observed following S1 and L2 preventative dosing. (C) Dot plot demonstrating a significantly higher percentage of granzyme B producing CD8+ T-cells following treatment with either the S1 or L2 vaccines found in tumors. Increasing dot size is correlated with an increasing percentage of cells expressing granzyme B while dot color corresponds to the average expression level as indicated by mean fluorescent intensity. (D) Similarly, a significantly higher fraction of CD4+ and CD8+ T cells expressed interferon-gamma following both S1 and L2 vaccination, with significantly higher average expression seen in CD4+ T cells for both vaccines. For all panels, *p<0.05, **p<0.01, ***p<0.001 by one-way ANOVA (Analysis of Variance) through Tukey’s post hoc tests, all other pairwise comparisons are not statistically significant. PBS, phosphate-buffered saline; PIC, Poly(I:C).

To examine the functionality of these tumor-infiltrating lymphocytes, we additionally compared the expression of granzyme B (GZMB) and interferon-gamma (IFNG) by treatment group. Primary mammary tumors in mice treated with either S1 or L2 had significantly higher percentages of GZMB producing CD8+ T cells within the tumor microenvironment. However, vaccination did not significantly increase the average expression level of GZMB among tumor-infiltrating CD8+ T cells (figure 5C and online supplemental figure 10). In contrast, S1 and L2 vaccination significantly increased the proportion of IFNG producing CD4+ T cells, as well as increased the average expression of IFNG within these CD4+ T cells (figure 5D and online supplemental figure 11). Interestingly, the proportion of IFNG producing CD8+ T cells was significantly increased in response to S1 vaccination only (online supplemental figure 11C). Other than this distinct population of cytotoxic CD8+ T cells, no additional differences were observed in the proportion of cells expressing or in the average expression level of GZMB and IFNG between mice vaccinated with S1 and L2. In summary, these findings demonstrate that the S1 vaccine induces robust CD4+ and CD8+ T-cell antitumor functions that contribute to increased tumor cell death and reduced tumor growth in this mouse model of TNBC.

CTA vaccination drastically reduces the number of pulmonary metastases

Both Siglece and Lin28a have significant expression in lung metastases in this 4T1 mouse model of TNBC (figure 3E,F and online supplemental figures 10–11). We therefore asked whether CT gene vaccination could be used to slow or prevent the development of metastatic disease. Gross pathologic examination of lungs harvested from day 28 mice treated with the preventative dosing vaccination schedule demonstrated a significant reduction in metastatic disease burden in mice treated with either S1 or L2 (figure 6A). This reduction in the number of lung metastases was further supported by histologic analysis (figure 6B). Quantification of the number of observable lung metastases in each sample confirmed a significant reduction in the number of metastatic sites in those mice treated with either the S1 or L2 peptide vaccine (figure 6C).

Figure 6

Cancer–testis (CT) antigen vaccination significantly reduces lung metastases. (A) Representative photos showing the extent of lung metastases on gross examination and (B) histological examination with H&E staining following harvest on day 28 in mice treated following the preventative dosing strategy. Scale bars correspond to 2 mm. (C) Quantification demonstrates significantly reduced numbers of lung metastases following treatment with either S1 or L2 CT antigen vaccines. (D) Representative photos demonstrating the number of 4T1 colonies formed from lung tissue harvested on day 28 following the preventative dosing strategy. (E) The number of 4T1 colonies was significantly reduced following treatment with the S1 and L2 vaccines. For all panels, *p<0.05, **p<0.01, ***p<0.001 by one-way ANOVA (Analysis of Variance) through Tukey’s post hoc tests, all other pairwise comparisons are not statistically significant. PBS, phosphate-buffered saline; PIC, Poly(I:C).

To further characterize the ability of CTA vaccination to reduce the number of lung metastases, we performed a clonogenic assay. Briefly, mice were again treated according to the prevention strategy and single-cell suspensions made from lungs harvested on day 28. We then plated 200 cells in complete media supplemented with 6-thioguanine and subsequently counted the number of 4T1 colonies after 14 days (figure 6D). Quantification of the number of colonies from each treatment type again demonstrated a significant reduction in the numbers of lung 4T1 colonies from those mice treated with either the S1 or L2 peptide vaccine (figure 6E). These findings are again indicative of a lower overall metastatic disease burden in mice treated with our CTA vaccines. Altogether, findings show a striking and significant reduction in the number of lung metastases following vaccination.

Discussion

The capacity of the immune system to specifically attack neoplastic cells renders it a powerful weapon for the long-term control of many different cancer types. T-cell recognition of high-quality neoantigens underlies natural antitumor immunity and response to immunotherapy.59–61 Current ICB strategies have transformed our understanding of the tumor immune microenvironment and the clinical treatment of many high tumor mutational burden cancer types. However, these immunotherapies do not allow for targeted treatment of distinct tumor-specific neoantigens, with clinical use consequently often limited by off-target autoimmunity.62 Personalized cancer vaccines, which allow for induction of neoantigen-specific immunity, have shown promise in early clinical trials.3 4 These somatic mutation neoantigen targets, however, are frequently unique to each tumor and therefore clinical treatment requires labor-intensive bioinformatic analyses to design and subsequently synthesize a custom vaccine for each patient. The ability to target immunogenic germline sequences would therefore enable the design of common vaccines applicable to all tumors expressing that gene, with CT genes representing a particularly intriguing source.

Cancer vaccines targeting CTA have shown promise in melanoma,63–66 esophageal,67 gastric,68 69 head and neck,70 biliary,71 72 and pancreatic73 cancers, though recent phase III trials targeting melanoma-associated antigen 3 in the adjuvant setting for melanoma74 and non-small cell lung cancer75 were well tolerated but did not improve disease-free survival. Additional work is therefore needed to explore the efficacy of other CT genes, as well as alternative therapeutic strategies. Of particular interest will be the use of CT vaccination in patients with unresectable disease and in the neoadjuvant setting; ICB, for example, has recently been shown to significantly prolong event-free survival when received both before and after surgical resection when compared with adjuvant therapy alone.76 Finally, treatment with multiple CT targets may help alleviate the effects of antigenic escape77 and combination treatment with immune checkpoint inhibitors and other immunomodulators may further improve response rates.78 79

Despite this interest in the use of CT epitopes as neoantigen targets for immunotherapy,14–16 there has been no consensus method for identifying CT genes.29–31 The GTEx and TCGA sequencing repositories have enabled high-throughput screening for CT gene expression patterns,35 36 though these studies have not focused on the identification of immunogenic targets for immunotherapy. In this study, we used the Phi Correlation Coefficient41 with a highly conservative threshold to identify genes with testis-exclusive expression across the GTEx healthy tissue database. We additionally screened this set of testis-exclusive genes for human thymic expression,43 further narrowing our set of genes without detectable thymic expression and consequently the most likely to lack central tolerance. Consequently, we defined our final set of 103 CT genes as those testis-exclusive genes without thymic expression that were expressed at a threshold of 2 TPM in >5% of tumor samples across at least 2 separate tumor types. We then confirmed that the expression of these CT genes in the GDC pan-cancer cohort was correlated with more advanced tumor stage and markers of antitumor immunity, as well as overall survival.

While other studies have used the GTEx and TCGA databases to identify CT genes, there are currently no standardized criteria for CT expression. For example, Wang et al used a specificity measure80 that utilizes a scalar projection in high-dimensional tissue space to estimate testis-specific expression in a manner that does not depend on absolute expression levels. Those genes above a given threshold and that are expressed in 1% of tumor samples at a threshold of 5 normalized read counts were said to have CT expression. Alternatively, others have defined a proportionality score that identifies CT expression if testis samples account for at least 90% of that gene’s expression across all tissue types and then are expressed at RSE (relative standard error) >1 in >10% of tumor samples.36 81 82 These definitions did not consider thymic expression and resulted in the identification of 1019 and 745 CT genes, respectively. The bioinformatic pipeline presented herein aimed to identify a more narrow subset of CT genes that would be of particular interest to serve as both immunogenic and generalizable antigen targets.

TNBC is extremely aggressive and associated with poor prognosis and higher risk of early recurrence and metastasis. Because TNBC lacks targetable hormone receptor expression, the standard of care is chemotherapy. Yet, 50% of TNBC patients are insensitive to chemotherapy and most patients relapse and develop metastases within 3 years. Thus, a comprehensive analysis using a larger cohort of samples from various stages would be the next necessary step. Nonetheless, in support of these findings, we developed peptide vaccines against Siglece and Lin28a that are expressed in the 4T1 mouse model of TNBC. Mice sensitized with a peptide vaccine targeting Siglece had significantly reduced primary tumor sizes. We further demonstrated that this reduction in primary tumor growth was mediated by both CD4+ and CD8+ T-cell function and subsequent tumor cell death. However, quite strikingly, while both S1 and L2 vaccination resulted in an apparent increase in CD4+ and CD8+ T cells that produced IFNG and GZMB, respectively, T cells induced by L2 were not functional since primary tumor weights and tumor cell killing were not observed (figure 4). Together, these data suggest that L2 can elicit a ‘minimal’ immune response that is neither selective nor functional and may be dependent on the ability to generate IFNG producing CD8+ cytotoxic T cells (online supplemental figures 10 and 11). We further found that vaccination against both Siglece and Lin28a, which are expressed by metastatic tumors, significantly reduced the number of lung metastases in mice. Lin28a/b has been previously implicated in breast cancer metastasis83 84 and little is known of Siglece function in cancer.

CTAs are normally expressed in the testis but are highly expressed across cancer and associated with disease stage, an unfavorable prognosis, and cancer invasion, making them potentially promising targets. Notably, CTA vaccination revealed induction of both CD4+ and CD8+ T-cell responses, which is a promising direction for the future development of peptide vaccines for immunotherapy. Further studies are needed to ascertain the ability of CTAs with high immunogenicity and specificity to serve as robust targets for cancer immunotherapy across different cancer types, as part of combination therapies targeting multiple CTAs, and the ability of ICB or other immunotherapeutic strategies to improve CTA-based therapies.

In summary, CT genes represent an intriguing source of targetable antigens that may enable the development of new immunotherapeutics. In this study, we implement a novel bioinformatic pipeline that focuses on the high-throughput screening of the GTEx and TCGA databases to identify high-confidence testis-exclusive genes that are likely to lack central tolerance and are robustly expressed across at least two tumor types. We validate the immunogenicity of two CT gene orthologues using a mouse model of TNBC, with vaccination inducing tumor-specific immunity in an expression-dependent manner. Further exploration of these immunogenic CT genes as vaccination targets in human tumors will be of particular interest, in addition to the further identification and validation of therapeutic antigen-based vaccines that target immunogenic tumor-specific CT genes for the treatment of TNBC.

Data availability statement

Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication

Ethics approval

This study (2019-005) was approved by the Institutional Animal Care and Use Committee (IACUC) of the Feinstein Institutes for Medical Research.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • JAC and BM contributed equally.

  • Contributors BJB and GSA conceived of and designed the project. JAC, BDH and GSA performed the bioinformatics analysis. BM, JB, CS and ML performed all experiments and analyzed the data. JAC, BM, BJB and GSA wrote the manuscript and analyzed the data. All authors edited the manuscript and approved the final version. BJB and GSA accept full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.

  • Funding JAC was partially supported by NIHGM MSTP Training award T32-GM008444. GSA was funded by the Simons Foundation, an LIBH grant, and the Stand Up To Cancer-Breast Cancer Research Foundation Convergence Team Translational Grant Number 310 SU2C-BCRF 2015-001. BJB was funded by the Department of Defense Breast Cancer Research Program W81XWH-19-1-0113, the Manhasset Women’s Coalition Against Breast Cancer, a Northwell-Cold Spring Harbor Collaborative Research Grant, and by the philanthropy of Samantha and Adam Gordon.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.