Article Text

Original research
Intratumoral T-cell and B-cell receptor architecture associates with distinct immune tumor microenvironment features and clinical outcomes of anti-PD-1/L1 immunotherapy
  1. Aimilia Schina1,
  2. Zsofia Sztupinszki2,
  3. Inge Marie Svane1,
  4. Zoltan Szallasi2,
  5. Göran Jönsson3 and
  6. Marco Donia1
  1. 1National Center for Cancer Immune Therapy (CCIT-DK), Department of Oncology, Copenhagen University Hospital, Herlev, Herlev, Denmark
  2. 2Danish Cancer Society Research Center, Copenhagen, Denmark
  3. 3Department of Clinical Sciences Lund, Lund University, Lund, Sweden
  1. Correspondence to Dr Marco Donia; marco.donia{at}


Background Effective cooperation between B-cells and T-cells within the tumor microenvironment may lead to the regression of established tumors. B-cells and T-cells can recognize tumor antigens with exquisite specificity via their receptor complexes. Nevertheless, whether a diverse intratumoral B-cells and T-cell receptor (BCR, TCR) repertoire affects the tumor immune microenvironment (TIME) and clinical outcomes in patients treated with immunotherapy is unclear.

Methods We extracted information on BCR and TCR repertoire diversity from large clinical datasets and measured the association between immune receptor diversity features, the TIME, and clinical outcomes of patients treated with anti-PD-1/PD-L1 immunotherapy.

Results In multiple tumor types, an increasingly diverse TCR repertoire was strongly associated with a highly activated TIME, while BCR diversity was more associated with antibody responses but not with the overall B-cell infiltration nor with measures related to intratumoral CD8+T cell activity. Neither TCR nor BCR diversity was independent prognostic biomarkers of survival across multiple cancer types. However, both TCR and BCR diversity improved the performance of predictive models combined with established biomarkers of response to immunotherapy.

Conclusion Overall, these data indicate a currently unexplored immunological role of intratumoral B-cells associated with BCR diversity and antibody responses but independent of classical anticancer T-cells intratumoral activities.

  • Receptors, Immunologic
  • Immune Checkpoint Inhibitors
  • Biomarkers, Tumor
  • B-Lymphocytes
  • T-Lymphocytes

Data availability statement

Data are available in a public, open access repository. Data are available on reasonable request. The data analyzed and processed in this study are available from the TCGA Research Network:, and from the clinical datasets available in the following public repositories: the Sequence Read Archive (GSE78220 and GSE91061), the European Nucleotide Archive (PRJEB23709 and PRJEB25780), the database of Genotypes and Phenotype (phs000452.V.3.p1, phs001493.V.1.p1, and phs001919.V.1.p1), and the European Genome-phenome Archive (EGAS00001002928, EGAS00001002556, EGAS00001004445). Restrictions apply to the availability of part of these data, which were used under license for this study. Data are available from the authors upon reasonable request.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • T-cells and B-cells are key players in the immunogenic tumor microenvironment.


  • B-cell receptors diversity may impact immune response to cancer and influence the clinical outcomes of immunotherapy through antibody production.


  • B-cell receptor diversity and antibody production are emerging components of the tumor immune microenvironment.


Treatment with immune checkpoint inhibitors (ICIs) targeting the PD-1 or PD-L1 pathway resulted in unprecedented rates of durable clinical responses in multiple types of solid tumors. Nevertheless, most patients treated appear not to benefit from current ICIs.1 Extraordinary efforts to elucidate mechanisms of response and resistance have resulted in identifying several molecular biomarkers that can influence responses and be associated with clinical outcomes. Among such biomarkers, those identifying tumors with a pre-existing but suppressed immune response (T-cell inflamed or immunologically ‘hot’ tumors) or with a high level of potential foreign target-antigens appear to reach a clinically meaningful level of response prediction.2 Gene expression signatures to identify those tumors with an infiltrated-inflamed tumor immune microenvironment (TIME),3 such as high T-cell-inflamed signature gene expression profile (TIS-GEP), PD-L1, or signatures of tumors under immune attack)4–7 and genomic profiles indicating a high number of tumor foreignness, such as increased tumor mutational burden (TMB), being correlated to the number of potential neoantigens5 6 8–10 could enrich for patients who respond to ICIs. While TIS-GEP displayed a good level of correlation to intratumoral PD-L1 expression across multiple datasets,5 6 11 neither TIS-GEP nor PD-L1 expression was correlated to TMB.5 6 12 13 Hence, these biomarkers appeared to capture distinct biological features of T-cell activation and tumor antigenicity.

T-cell and B- cell immune receptors (TCR and BCR) can recognize antigens, including tumor antigens, with exquisite specificity and initiate the cascade of cellular and humoral adaptive immune responses. Whereas cytotoxic CD8+T cells (CTLs) endowed with tumor-specific TCRs are responsible for the central pathway leading to tumor regression,14 other cell types can regulate CTL formation and intratumoral activity. For example, intratumoral B-cells can prime T-cells via professional antigen presentation within tertiary lymphoid structures (TLS); however, the same cell type can also release antibodies with direct antitumor potential after exquisite antigen recognition via the BCR.15 While a more diverse intratumoral TCR repertoire has previously been associated with an increased abundance of pre-existing intratumoral CD8+T cells expressing CD8, PD1, and PD-L1 within melanoma16 and lung cancer,17 the pleiotropic functions of B-cells warrant further investigation on the intratumoral B-cell repertoires.

Here, we applied comprehensive bioinformatics and machine learning analyses to large transcriptomic datasets of tumor biopsies, shedding light on the interplay between T-cell and B-cell immune receptors’ architecture and their influences on TIME and response to current immunotherapies.


Data sources

We used RNA sequencing and clinical data from 30 cancer types from The Cancer Genome Atlas (TCGA) and 10 (in public or restricted access repositories) datasets, including patients treated with anti-PD-1 or anti-PD-L1 (PD-1/PD-L1) antibodies. The study descriptions of the RNA-Seq datasets and corresponding cancer type, treatment, and patient information are listed in online supplemental table S1. An extended description of the data sources can be found in online supplemental methods.

Supplemental material

Supplemental material

Transcriptomic analysis

Raw and processed counts from all anti-PD-1/L1 studies and TCGA were used to extract the TIS-GEP score, immune cell subtypes infiltration scores, and tumor purity. A detailed description of the RNA sequencing pipeline, methods, and tools used, the differential expression (DEA), gene ontology (GO) enrichment, and similarity analyses can be found in online supplemental methods.

TCR and BCR sequencing and estimation of diversity measures

TCR and BCR repertoire were analyzed from paired-end FASTQ files using the MiXCR tool V.3.0.618 in RNA-Seq mode. Alignment was performed using all reference V, D, J, and C genes of T-cell and B-cell receptors (TCR/BCR), and clonotypes were assembled to extract CDR3 regions. Due to the low detection of clone reads from bulk RNA-Seq data, further downsampling was not possible, thus leading to additional filtering by keeping samples with a minimum of four TCR clones and ten BCR clones. The diversity measure of richness was used to quantify TCR/BCR diversity. See online supplemental methods for further details.

Survival analysis

We used data from TCGA and available datasets of patients treated with anti-PD-1/PD-L1 of four histologies; SKCM (Skin Cutaneous Melanoma), KIRC (Kidney renal clear cell carcinoma), BLCA (Bladder Urothelial Carcinoma) and STAD (Stomach adenocarcinoma). All patients with available information that received immunotherapy were removed from the TCGA data. Each biomarker was assessed through separate univariable Cox regressions for predicting overall survival (OS) in each histology, and we incorporated all factors into tissue-specific multivariable Cox proportional hazard models to determine their independence. We did not include clinical characteristics with known prognostic value in the multivariable Cox models since they were not available in our data. Thus, the evaluation of independent factors was limited by the available features in the datasets used. Continuous variables were transformed to categorical by stratifying the data with the median score values for each predictor factor. The STAD anti-PD-1/L1 cohort did not include any information for logTMB; thus, univariable analysis was not performed for this factor. In addition, other critical patient-related factors, such as age and gender, were excluded from our study because of a high proportion of patients with missing data in a dataset-specific manner (online supplemental figure S2). Possible effects of multicollinearity were not dealt with during survival analysis, and the selection of predictors in the multivariable Cox models was out of the scope of our study. No interaction terms were included in the multivariable models. A minimum of seven events per predictor variable was set for all survival analyses since we assumed that smaller datasets do not have sufficient power to detect covariates that significantly impact survival. For unbiased estimates, we calculated bootstrapped confidence intervals and p values from ten thousand replicates in the multivariable analyses using the resamples with replacement method (case resampling). We used the R package survivalAnalysis to perform the univariate and multivariate Cox regression analysis and the R package boot.pval to get bootstrapped confidence intervals and p values.

Supplemental material

Regression and machine learning analyses

Data from 10 datasets, including 402 patients treated with anti-PD-1/anti-PD-L1 monotherapy in SKCM, KIRC, BLCA, and STAD, were used to train and then validate histology non-specific and histology-specific models predicting objective response to immunotherapy. To increase the robustness of the analyses on objective responses, only patients classified with complete response (CR), partial response (PR), or progressive disease (PD) as a best overall response were included in the predictive models. We applied logistic regression to evaluate the performance of TCR and BCR richness as standalone predictors and compared them to the performance of known established biomarkers logTMB, PD-L1, and TIS-GEP. We additionally developed multivariable logistic regression models using all combinations of TCR, BCR richness, logTMB, PD-L1, and TIS-GEP. Finally, we integrated all nine biomarkers included in our study (TCR, BCR richness, logTMB, PD-L1, TIS-GEP, CD4+, and CD8+T cells infiltration, B-cells infiltration, and tumor purity) into multivariable models, using recursive feature elimination (RFE) as a robust feature selection method combined with six regression and machine learning algorithms. Our analysis focused on assessing the biomarkers in histology non-specific and histology-specific settings, and datasets with histology-specific restrictions were only used in their parent histology. More details are included in online supplemental methods.

All predictive models were built in R using the caret package.19 Receiver operating characteristic curve-area under the curve (ROC-AUC) calculations were made using the precrec R package, and comparing the ROC-AUC curves using the bootstrap method was performed using the pROC R package. The threshold for significance for the extracted p value of the test was set to 0.05.

Statistical analysis

All statistical analyses were carried out using R V.4.1.0 ( We considered the level of statistical significance to be 0.05 across all our analyses. Spearman’s correlation was used in all correlation analyses since it is a non-parametric method not based on the assumption of normality. Significant correlations were reported, with the significance level set to 0.05. Reporting naming of the strength of correlation coefficients was based on Hinkle et al,20 as shown in online supplemental table S2.


Study biomarkers and overall approach

Contemporary RNA sequencing datasets, including hundreds of tumor biopsies, are widely available, and this methodology is widely adopted across laboratories for potential mass-scale use. Therefore, we primarily investigated TIME biomarkers that could be measured from RNA-Seq. We performed a comprehensive immune repertoire diversity analysis, based on TCR and BCR richness, in TCGA (30 histological tumor types, pan-cancer analyses) and 10 additional datasets, including pretreatment samples of patients who received anti-PD-1/L1 immunotherapy (4 histological types, multicancer analyses). The selection of the richness metric for TCR and BCR is extensively discussed in the Methods section. In addition, other TIME biomarkers with potential predictivity of response to anti-PD-1/anti-PD-L1 that can be measured with RNA-Seq were selected for this study. Based on the robustness of such biomarkers and previous extensive use in other studies, we selected seven additional factors such as intratumoral infiltration of CD8+T cells,21 TIS-GEP score,4–6 22 PD-L1 expression,13 TMB,10 23 intratumoral infiltration of CD4+T cells24–26 intratumoral infiltration of B-cells27–30 and tumor purity.31 32 In addition, TMB estimated based on another widely available technology (DNA sequencing or targeted gene-panel sequencing) was included because of its proven value in immunotherapy. Figure 1 presents an overview of the study design.

Figure 1

Design of the TCGA and anti-PD-1/L1 biomarkers study. ICI, immune checkpoint inhibitor; ORR, objective response rate.

TIME features associate with each other in a more patient than cancer-type-specific manner

To understand how the intratumoral immune receptor diversity shapes the TIME, we assessed correlations among T-cell and B-cell immune receptor repertoire, genomic, transcriptomic, and immune infiltration in the tumor microenvironment (TME). Summarizing network plots of intra-pan-cancer cohort and multicancer cohort correlation values across selected TCGA tumor types are shown in figure 2A,B and the individual four-type multicancer correlations in online supplemental figure S3. In both pan-cancer and multicancer cohorts, we observed that PD-L1 expression, TIS-GEP, and tumor purity had the highest absolute correlation magnitudes, with moderate to high associations with each other (figure 2A,B). TCR richness had moderate correlations to these biomarkers yet low correlation to CD8+ and CD4+ T cells infiltration (rSpearman<0.45, p<0.05; figure 2A,B). We additionally observed a low correlation between intratumoral B-cell repertoire richness and B-cell infiltration (rSpearman<0.35, p<0.05; figure 2A,B). Immune cell infiltration biomarkers (CD8+, CD4+T cells, B-cells) and logTMB were the least ‘clustered’, that is, having the lowest overall correlation with the rest of the features of the TME (figure 2A,B). In SKCM and KIRC cohorts, TCR richness displayed higher correlation values with intratumoral CD8+T cell infiltration (rSpearman>0.5, p<0.05; online supplemental figure S3A,B). TCR and BCR richness were moderately intercorrelated in all analyses, except for the KIRC cohort (0.59<rSpearman<0.7, p<0.05; figure 2A,B and online supplemental figure S3). Despite minor heterogeneity in analyses across four individual tumor types, the overall correlation patterns followed the merged pan-cancer and multicancer results (online supplemental figure S3). Complete correlation analysis, including statistical significance results, can be found in online supplemental file 1.

Supplemental material

Supplemental material

Figure 2

Biomarker associations within cancer cohorts (pan-cancer/multicancer) and across cancer types. Biomarkers pairwise correlations network plots (A) Pan-cancer: across 30 TCGA cancer types (merged). (B) Multicancer (four cancer types, merged): skin cutaneous melanoma (SKCM), kidney renal clear cell carcinoma (KIRC), bladder urothelial carcinoma (BLCA), stomach adenocarcinoma (STAD). Each point represents a variable/biomarker, each path the correlation between the two variables it joins and the proximity of variables the overall magnitude of their correlations. Highly correlated biomarkers appear to cluster together and are connected with stronger paths (wider and less transparent). The path’s color indicates positive (red) and negative (blue) correlations and the biomarkers’ proximity was determined using multidimensional clustering of absolute correlation values. Only significant Spearman correlations are shown (p<0.05). T-cell receptor (TCR) and B-cell receptor (BCR) richness values in all TCGA patients. Boxplots and points show summary statistics, and individual (C) TCR richness values in each cancer type, ranked by median CD8+T cells absolute infiltration scores, (D) TCR richness values in each cancer type, ranked by median T-cell inflamed signature gene expression profile (TIS-GEP) scores, (E) BCR richness values in each cancer type, ranked by median B-cells absolute infiltration scores. The red dotted line represents the right-aligned rolling sum of mean TCR/BCR richness across the ranked TCGA cohorts. TCGA, The Cancer Genome Atlas.

In summary, while PD-L1, TIS-GEP, (low) tumor purity, TCR richness, and (in part) high CD8+T cell infiltration correlated within each patient and, therefore, may represent biomarkers of the same biological phenomenon, other TME features such as TMB, B cell infiltration and measures of BCR diversity did not appear strongly correlated therefore may inform on independent intratumoral biology. Overall, similar associations were found across tumor types.

Immune repertoire diversity measures do not necessarily reflect classical TIME measures

Interhistology comparisons confirmed that tumor histological types with high median CD8+ infiltration did not necessarily display high average TCR diversity (figure 2C). Confirming intratumor analyses, tumor types with a high intratumoral T-cell activity measured by TIS-GEP score had a predominantly more diverse TCR repertoire (figure 2D). Interestingly, the quantity of tumor-infiltrating B-cells was not higher in tumor types with a more diverse intratumoral B-cell repertoire (figure 2E), and the diversity of the BCR repertoire did not associate with higher CD8+T cells infiltration or TIS-GEP score (figure 3A,B). At the individual patient level, both TCR and BCR richness displayed significant intersample variability within and across most tumor types, where subsets of patients with elevated TCR or BCR richness could be identified in almost all tumor types (figure 2C–E).

Figure 3

BCR richness and intratumoral immune response. Boxplots of BCR richness values in all TCGA patients, points show summary statistics, and individual BCR richness values in each cancer type, ranked by median (A) CD8+T cells absolute infiltration scores, (B) TIS-GEP scores. The red dotted line represents the right-aligned rolling sum of mean BCR richness across the ranked TCGA cohorts. (C) Gene ontology (GO) dotplots of the significantly enriched GO biological processes identified under the contrast setting of high versus low BCR richness samples in SKCM, KIRC, BLCA, and STAD. Only upregulated genes (log fold change; LFC>0, padjusted<0.05) were used to extract enrichment GO categories. Enriched terms were first filtered for the focused GO terms set; then, redundant terms were removed by similarity analysis (see the Methods section). The dot size represents the gene count in the enriched biological process, and the dot color shows the enriched biological process significance (p adjusted). The color of the GO terms indicates the broader category (from the focused set) the terms belong to; red: immunoglobulin production, light green: antigen receptor-mediated signaling, dark green: antigen processing and presentation, blue: lymphocyte activation involved in immune response, brown: type II interferon production, orange: response to type II interferon. BCR, B-cell receptor; BLCA, bladder urothelial carcinoma; KIRC, kidney renal clear cell carcinoma; SKCM, skin cutaneous melanom; STAD, stomach adenocarcinoma; TCGA, The Cancer Genome Atlas; TIS-GEP, T-cell-inflamed signature gene expression profile.

Overall, the TCR repertoire diversity had a moderate to strong association with intratumoral T-cell activity (measured with the TIS-GEP) but a low association with the CD8+T cell infiltration. Contrary, BCR diversity had only low or negligible association with measures of the TIME (measured with CD8+T cells infiltration and the TIS-GEP), including the number of intratumoral B-cells.

BCR diversity strongly associates with enriched pathways of ongoing antibody production but has low association to CD8+ T-cell immunity

To investigate further the link between BCR diversity and intratumoral immune responses, we performed DEA between samples with high and low BCR richness in the four tumor types of the multicancer cohort. Upregulated genes were used to identify enriched GO biological processes in four tumor types, focusing on processes of lymphocyte activity, antigen processing, presentation and signaling, immunoglobulin production, and interferon gamma (IFNg) production/response to IFNg. In virtually all tumor types, high BCR richness was associated with a significant upregulation of pathways related to antibody production and BCR-mediated signaling pathways (figure 3C). On a different note, some (low) levels of enrichment we also observed for antigen-processing and presentation alongside pathways related to lymphocyte activation (figure 3C); in contrast, pathways related to CD8 T cell recruitment, differentiation, and activity were consistently strongly upregulated, and enriched in samples with high PD-L1 expression or CD8+T cell infiltration (online supplemental figure S4), which were used as control cases. Extensive tables with the results of the DEA, GO enrichment, and similarity analysis can be found in online supplemental files 2 and 3.

Supplemental material

Supplemental material

Supplemental material

These results highlight the lack of strong associations between BCR repertoire diversity and T-cell immunity while indicating a potential role of BCR richness in influencing the ability to produce cancer-related antibodies within tumors.

TCR and BCR richness do not associate with the objective response rate or predict OS independently following Anti–PD-1/L1 therapy

Previous analyses have shown that the average level of TMB within a given tumor histological type is associated with the objective response rate (ORR).10 We observed a weak, non-significant, positive correlation between median TCR or BCR richness and average ORR to anti-PD-1/L1 across TCGA tumor types (figure 4A,B). Other parameters analyzed in this study, such as CD4+, CD8+infiltration, TIS-GEP, and PD-L1, were all positively correlated to ORR, with higher but still weak (r<0.4) correlation values across tumor types (online supplemental figure S5B–E). Intratumoral B-cells infiltration and tumor purity had a negative, weak correlation to the ORR (online supplemental figure S5F,G).

Supplemental material

Figure 4

Assessment of the effect of TCR/BCR richness and other biomarkers on clinical outcomes across cancer types. Objective response rate correlations with (A) median TCR richness, (B) median BCR richness across 26 TCGA tumor types. The dot size represents the number of patients evaluated for each TCGA cohort. Both Spearman and Pearson correlation coefficients are shown. TCGA study abbreviations are reported at (C) HRs of univariableand multivariable Cox regression analysis in TCGA and anti-PD-1/L1 monotherapy clinical datasets. HR sizes are shown on the y-axis, and the biomarkers/factors that are investigated are shown on the x-axis. The type of survival analysis (univariable or multivariable) is annotated with different shapes; circle: univariable in TCGA, square: univariable in anti-PD-1/L1, triangle: multivariable in anti-PD-1/L1. The significance of the biomarker is shown with colors; gray: nonsignificant, orange: p≤0.1, yellow: p≤0.05. Bootstrapped p values are shown. Note: multivariable analysis was only performed in SKCM and BLCA anti-PD-1/L1 datasets due to the threshold of 7 events per predictor variable set, limiting the available large enough cohorts for the analysis. BCR, B-cell receptor; BLCA, bladder urothelial carcinoma; ORR, objective response rate; SKCM, skin cutaneous melanoma; TCGA, The Cancer Genome Atlas; TCR, T-cell receptor.

We then evaluated the prognostic and predictive value of the nine selected factors representing immune repertoire diversity, genomic, transcriptomic, and TME features by building univariate and multivariate Cox proportional hazard models across the SKCM, KIRC, BLCA, and STAD cohorts.

Univariable Cox proportional hazards analysis revealed that high TCR and BCR richness are positively associated with improved survival in SKCM, in both TCGA and anti-PD1/L1 cohorts, with similar HR magnitudes (TCR: HRTCGA=0.49, HRanti-PD1/L1=0.59; BCR: HRTCGA=0.54, HRanti-PD1/L1=0.58; p<0.05). BLCA patients with high TCR richness had more favorable outcomes (HRTCGA=0.73, p<0.1; HRanti-PD1/L1=0.76, p<0.1), yet, we did not find any other significant association of TCR or BCR diversity with survival in the rest of the cancer types evaluated (figure 4C). High B-cell infiltration was not consistently associated with improved survival in any cancer type. A possible explanation of the above is that, in SKCM patients, where high BCR richness improves survival significantly, having a more diverse antibody population is more important than the number of B-cells infiltrating the tumor site. Furthermore, high logTMB and CD8+T cells infiltration significantly positively impacted the survival of BLCA patients in both TCGA and anti-PD1/L1 cohorts. None of the nine selected factors were associated with OS in STAD patients. Figure 4C, online supplemental figures S6–8, and online supplemental tables S3-6 illustrate these findings, providing the hazard ratios and Kaplan-Meier plots of all the biomarkers for all analyses across different histologies.

Supplemental material

Supplemental material

Supplemental material

In conclusion, since the cohorts employed (TCGA and anti-PD-1/L1 datasets) differed mainly by the treatments administered, our results suggest that the high diversity of the TCR and BCR has a positive prognostic value in SKCM, regardless of therapy. High TCR richness also has a favorable impact on the OS of BLCA patients with marginal statistical significance. Furthermore, as measured by the HR, TCR and BCR diversity’s effect on survival was superior to some established biomarkers, such as PD-L1 expression and TIS-GEP in SKCM and BLCA cohorts.

Multivariable analysis integrating all available biomarkers showed that high TCR or BCR richness were not independent predictors of OS when all other features were considered. Notably, with the exception of high logTMB and CD8+T cell infiltration in BLCA, none of the other parameters, including the established biomarkers PD-L1, logTMB (in all cohorts except BLCA), or TIS-GEP, were independent OS prognostic factors.

Integrative multivariate modeling reveals TCR and BCR repertoire diversity as valuable biomarkers of clinical response following anti-PD-1/anti-PD-L1 monotherapy

Identifying biomarkers of response to ICI is a major challenge in clinical oncology. Thus, despite TCR and BCR richness not being independent prognostic factors for OS, we explored their potential for predicting objective responses to immunotherapy when combined with other biomarkers of the TIME. The workflow describing the overall strategy used to evaluate the predictive performance of the biomarkers and the three modeling histology schemes are presented in online supplemental figure 9.

Supplemental material

Overall, the multivariable models integrating several biomarkers outperformed the univariable biomarker models in predicting response to anti-PD-1/L1 immunotherapy, regardless of histology, emphasizing the need for more complex predictive models to achieve meaningful predictions (figure 5A). In the histology non-specific setting (scheme 1), the optimal multivariable model selected during recursive elimination (RFE-Naïve Bayes; RFE-NB) included TCR and BCR richness, logTMB, PD-L1 expression, CD8+, and CD4+T cells infiltration, B-cells infiltration, and tumor purity as important predictors (figure 5B). This optimal model demonstrated a high prediction accuracy when tested on the histology non-specific anti-PD1/L1 test cohort (consisting of BLCA, SKCM, and KIRC patients), with a ROC curve of 0.737, outperforming PD-L1 (ROC-AUC=0.584), TIS-GEP (ROC-AUC=0.637, pROC-AUC comparison<0.05) and marginally logTMB (ROC-AUC=0.601, pROC-AUC comparison =0.061) (figure 5A, online supplemental figure S10A). However, it did not yield high prediction performance when tested on the SKCM and BLCA anti-PD1/L1 validation data separately (figure 5A, online supplemental figure S10A). Interestingly, when tested on the external validation KIRC anti-PD-1/L1 cohort, the RFE-NB histology non-specific model (ROC-AUC=0.833), alongside the univariable TCR (ROC-AUC=0.743), BCR (ROC-AUC=0.726) richness, and TIS-GEP (ROC-AUC=0.741) models were superior to the established biomarkers logTMB (ROC-AUC=0.45) and PD-L1 (ROC-AUC=0.575) (figure 5A, online supplemental figure S10A), showing the potential of TCR and BCR richness in improving predictions of clinical response models. In the anti-PD-1/L1 STAD test cohort, the univariable TIS-GEP (ROC-AUC=0.858), alongside PD-L1 (ROC-AUC=0.809), showed the best diagnostic ability, outperforming TCR and BCR richness (figure 5A, online supplemental figure S10A).

Supplemental material

Figure 5

Prediction of response to immune checkpoint inhibitors using established and potential biomarkers. (A) Receiver operating characteristic curve-area under the curve (ROC-AUC) values of univariable models of established biomarkers (logTMB, PD-L1, TIS-GEP), TCR richness, and BCR richness compared with the multivariable, optimal recursive feature elimination (RFE) models. Columns represent univariable/multivariable models, while rows represent the histology non-specific/specific cohorts used to test each model. The sample size of the test cohorts used to evaluate each model is shown in parenthesis next to each cohort label. All three schemes are included: histology non-specific (scheme 1), BLCA-specific (scheme 2), and SKCM-specific (scheme 3). The dot size and color represent the ROC-AUC value, green; lower ROC-AUC values signify models with low discriminatory ability, and red, higher ROC-AUC values indicate models with high discriminatory predictive power. Columns, that is, uni/multivariable models, are ordered by mean ROC-AUC values across schemes and test cohorts. Note: Univariable logTMB and multivariable RFE models which use logTMB as a predictor were not fit for STAD since the anti-PD-1/L1 STAD cohort did not include TMB information. (B) Final selected predictors from the RFE process included in the optimal models across the three histology non-specific and histology-specific models schemes. Black: predictor included in the model; white: predictor not included in the model. BLCA, bladder urothelial carcinoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TIS-GEP, T-cell-inflamed signature gene expression profile; TMB, tumor mutational burden.

In the BLCA-specific setting (scheme 2), the optimal model (RFE-Support vector machine; RFE-SVM Radial) contained TCR and BCR richness combined with logTMB, PD-L1 expression, CD8+, and CD4+T cells infiltration, B-cells infiltration, and tumor purity as important covariates (figure 5B). This multivariable model yielded significantly high accuracy (ROC-AUC=0.766), outperforming the univariable TCR, BCR richness, PD-L1, and TIS-GEP models, but not logTMB (ROC-AUC=0.755, pROC-AUC comparison>0.05) (figure 5, online supplemental figure S10B). Finally, the optimal model in the SKCM-specific setting (scheme 3) (RFE-SVM-Linear) incorporated TCR and BCR richness as important predictors along with logTMB, B-cell infiltration, and tumor purity (figure 5B). It yielded a high prediction accuracy of ROC-AUC=0.807, significantly higher than the univariable BCR richness (pROC-AUC comparison =0.051), PD-L1, and GEP (pROC-AUC comparison<0.05) logistic regression models (figure 5, online supplemental figure S10C). However, with the available sample sizes, we did not get significantly different ROC curves compared with the logTMB model, which had moderate performance (ROC-AUC=0.679) (figure 5A, online supplemental figure S10C). Results of all the ROC curves comparisons of all models and across the three schemes are presented extensively in online supplemental file 4.

Supplemental material

In conclusion, our machine learning strategy showed that both TCR and BCR richness could be valuable predictors of response when combined with other known parameters. These results support the inclusion of TCR and BCR richness in more complex multivariable models to predict response to immunotherapy in multicancer and cancer-specific contexts.


Multiple biomarkers to enrich for patients to respond or progress following treatment with ICI are currently available.2 Combining biomarkers with multimodal strategies may increase the accuracy of predicting ICI outcomes,6 23 but current testing is far from precise. For example, previously published biomarkers may explain only ∼0.6 of the variance in ICI outcomes.33 An accurate prediction of treatment outcome is vital to design the optimal treatment strategy for the individual patient when multiple options are available, avoiding unnecessary side effects and reducing treatment costs. Characterizing the intratumoral TCR and BCR clonal architecture and quantifying complementary TIME parameters related to the immune repertoire presents a conceptually simple analysis with potentially broad application across studies and datasets.

This study investigated whether TCR and BCR repertoire measures are meaningfully associated with TIME parameters in a pan-cancer and multicancer context. Our results showed that TCR richness reflected a T-cell-activated TME across multiple cancers and was a favorable prognostic biomarker, thus indicating the utility of TCR richness in describing an effective T-cell tumor-specific immune response. Of note, having an (unselected) high CD8+ or CD4+ T cell infiltration was not consistent with a rich TCR repertoire, indicating a need to study the clonal diversity of specific subpopulations of intratumor T-cells. Assuming a constant proportion of TCRs to be tumor-antigen specific across samples, a rich intratumoral TCR repertoire can be used to measure the diversity of the adaptive immune repertoire directed to an individual patient’s cancer. However, our current approaches to uncovering the intratumoral TCR landscape do not inform the relationship between TCRs and tumor antigens specificity with T-cell (dys)functional and phenotypic profiles.

As recent studies have started to uncover the fundamental role of B-cells and TLS in guiding antitumor responses in patients receiving anti-PD-1/L1 immunotherapy,34–36 we evaluated the importance of pre-existing intratumoral B-cell infiltration and diversity of the BCR. We found that the (unselected) abundance of intratumoral B-cells did not correlate with the diversity of the BCR repertoire nor with improved survival across the multicancer cohort. Despite the interhistology and intrahistology variability of the BCR diversity, we observed a strong, consistent association of high BCR richness to measures indicating ongoing antibody production, as opposed to pathways related to other B cell functions, such as antigen presentation leading to intratumoral T-cell activation. Combined with the predictive value of BCR richness for response to ICI within multiparameter models, this information may indicate that antibody-producing B-cells exert a humoral adaptive immune response to cancer.

A notable observation from the survival and response prediction analyses is the heterogeneity of prognostic and predictive value of the different biomarkers across different tumor histological types. In particular, while the functional enrichment showed that in samples with high BCR richness, there is an ongoing antibody production across all tumor types, having a more diverse BCR significantly benefited patient survival only in SKCM. We hypothesize that specific cancer types, such as melanoma, are more susceptible to anticancer activities of B cell functions (such as antibody production). However, their exact mechanisms and roles are still elusive and would need further studies to be uncovered.

Overall, the low to moderate correlations between TCR/BCR richness and distinct TIME features demonstrate the potential to combine these biomarkers in multivariable predictive models of clinical outcomes to anti-PD-1/L1 immunotherapy. Although our results demonstrated that intratumoral TCR and BCR diversity, infiltration of B-cells, and tumor purity are biomarkers that cannot be used to identify tumor types with high response rates, the relationship between the T-cell and B-cell clonal architecture of a tumor and the response to anti-PD-1/L1 immunotherapy of an individual host appears highly complex. Previous studies have proposed that a high pretreatment or on-treatment clonal diversity of tumor-infiltrating lymphocytes positively correlates with response to ICI.16 37 However, other studies have suggested that only increased clonality levels during treatment associate with anti-PD-1 clinical outcomes.38 39 Our machine learning approach has provided evidence of the vital contribution of both TCR and BCR richness in multivariable predictive models and that they may represent emerging biomarkers to understand the dynamics of tumor–immune cell interactions. Notably, all the parameters included in our approach can be derived by leveraging RNA-Seq and optimized panel-based TMB measurement techniques, thereby reducing the overall workload toward clinical implementation. Our findings demonstrate the predictive advantage of using more complex models and consistent TCR and BCR richness selection as essential variables across several tumor types. Further gained insights from our study include the potential customized models needed for each histology since some of these biomarkers appear to also provide more tissue-specific benefits. We propose that TCR and BCR diversity substantially contributes to predicting clinical outcomes, especially when combined with established biomarkers. As new data emerge, prospective studies are warranted to better understand their mechanistic role and confirm the applicability of the TCR and BCR richness as prognostic and predictive biomarkers of response to anti-PD-1/L1 immunotherapy.

Our study has several limitations. First, an important issue was that dissecting T-cell/B-cell types and receptor repertoire information from (unsorted) bulk RNA-Seq data is cumbersome.40 41 Focusing on more specialized B-cell subsets and states or on TLS could contribute to developing a more precise B-cell-related biomarker. Still, limitations imposed by using bulk RNA-Seq data did not allow this. The performance of RNA-Seq based TCR and BCR repertoire profiling tools is affected by read length, RNA sequencing depth and reads topology. According to benchmarking studies, MiXCR is consistently highly efficient in capturing the majority of clonotypes even with reduced read lengths.42 43 All of the used datasets had adequate RNA sequencing depth for the profiling of TCR and BCR repertoires and, except one, all had 100 bp (paired-end) reads. Next, we acknowledge that our machine-learning strategy was limited by the available sample size, mainly when tissue-specific models were constructed. To enhance the reliability of our models’ generalization performance and prevent overoptimistic evaluation of our findings, we employed a feature engineering method, which captured the most relevant predictors from our data. In addition, we used the repeated k-fold cross-validation technique. Furthermore, to evaluate the models, we used the ROC-AUC metric, compared the curves via statistical comparisons with bootstrapping and determined CIs in survival analysis. By implementing this comprehensive approach, we ensured more accurate estimates and improved the robustness of our results. However, while separate test and external validation data were used to evaluate the performance of the models, the establishment of the biomarkers’ robustness and the developed models would necessitate additional independent validation cohorts.


In conclusion, our data added critical information on the role of intratumoral TCR diversity and suggested a neglected but positive role of a highly diverse BCR repertoire for intratumoral antibody production and, potentially, cancer control. This study also showed a potential benefit of TCR and BCR as predictive biomarkers—in combination with established biomarkers—for discriminating patients responding to anti-PD-1/L1 immunotherapy across several cancer types.

Supplemental material

Data availability statement

Data are available in a public, open access repository. Data are available on reasonable request. The data analyzed and processed in this study are available from the TCGA Research Network:, and from the clinical datasets available in the following public repositories: the Sequence Read Archive (GSE78220 and GSE91061), the European Nucleotide Archive (PRJEB23709 and PRJEB25780), the database of Genotypes and Phenotype (phs000452.V.3.p1, phs001493.V.1.p1, and phs001919.V.1.p1), and the European Genome-phenome Archive (EGAS00001002928, EGAS00001002556, EGAS00001004445). Restrictions apply to the availability of part of these data, which were used under license for this study. Data are available from the authors upon reasonable request.

Ethics statements

Patient consent for publication


The authors would like to acknowledge the TCGA Research Network as well as the authors who made their clinical datasets available in public repositories, such as the Sequence Read Archive (GSE78220 and GSE91061), the European Nucleotide Archive (PRJEB23709 and PRJEB25780), the database of Genotypes and Phenotype (phs000452.V.3.p1, phs001493.V.1.p1, and phs001919.V.1.p1), and the European Genome-phenome Archive (EGAS00001002928, EGAS00001002556, EGAS00001004445), and their respective respecting funding agencies. Therefore, this study was also supported by the National Human Genome Research Institute (NHGRI) Large Scale Sequencing Program, Grant U54 HG003067 to the Broad Institute (PI, Lander); an AACR KureIt grant (PI, Van Allen, Dana-Farber Cancer Institute, Boston, Massachusetts, USA, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA); Antoni Ribas MD, PhD University of California, Los Angeles, Funding Source:R35 CA197633 National Institutes of Health, Bethesda, Maryland, USA. Miklos Diossy, Ph.D. (Technical University of Denmark, Lyngby, Denmark) is acknowledged for providing valuable scientific insights. The authors would finally like to thank the Computerome 2.0 HPC Facility for hosting and 413 storing data and providing analysis tools that were widely used in this study.


Supplementary materials


  • Twitter @jonsson_lab, @doniamarco

  • Contributors Conceptualization, AS, ZSztupinszki and MD; methodology, AS, ZSztupinszki and ZSzallasi; software, AS; validation, AS, ZSztupinszki, ZSzallasi and GJ; formal analysis, AS; investigation, AS; resources, AS, ZSztupinszki, IMS, ZSzallasi and MD; data curation, AS; writing—original draft preparation, AS and MD; writing—review and editing, AS, ZSztupinszki, IMS, ZSzallasi, GJ and MD; visualization, AS; supervision, ZSztupinszki and MD; project administration, MD; funding acquisition, IMS and MD. All authors have read and agreed to the published version of the manuscript. MD is responsible for the overall content as guarantor.

  • Funding The study was sponsored by the Danish National Board of Health grant numbers 05-0400-18 and 05- 0400-50, 'Empowering Cancer Immunotherapy in Denmark', the Herlev and Gentofte Hospital Research Council under Clinician-Scientist Grant, and the Lundbeck Foundation under Grant R307-2018-3636. According to the use of clinical datasets available in dbGaP repositories, this study was also supported by the National Human Genome Research Institute (NHGRI) Large Scale Sequencing Program, Grant U54 HG003067 to the Broad Institute (PI, Lander); an AACR KureIt grant (PI, Van Allen, Dana-Farber Cancer Institute, Boston, MA, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA); Antoni Ribas MD, PhD University of California, Los Angeles, Funding Source:R35 CA197633 National Institutes of Health, Bethesda, Maryland, USA.

  • Competing interests MD has received honoraria for lectures from Roche and Novartis (past 3 years), proprietary data access from Bristol Myers Squibb and Genentech, and is the advisor of Achilles Therapeutics. IMS has received honoraria for consultancies and lectures from Novartis, Roche, Merck, and Bristol Myers Squibb; a restricted research grant from Novartis; and financial support for attending symposia from Bristol Myers Squibb, Merck, Novartis, Pfizer, and Roche. All other authors declare no conflicts of interest.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.