Article Text

Download PDFPDF

Society for Immunotherapy of Cancer clinical and biomarkers data sharing resource document: Volume I—conceptual challenges
  1. Sergio Rutella1,2,
  2. Michael A Cannarile3,
  3. Sacha Gnjatic4,
  4. Bruno Gomes5,
  5. Justin Guinney6,
  6. Vaios Karanikas7,
  7. Mohan Karkada8,
  8. John M Kirkwood9,
  9. Beatrix Kotlan10,
  10. Giuseppe V Masucci11,
  11. Els Meeusen12,
  12. Anne Monette13,
  13. Aung Naing14,
  14. Vésteinn Thorsson15,
  15. Nicholas Tschernia16,
  16. Ena Wang17,
  17. Daniel K Wells18,
  18. Timothy L Wyant19 and
  19. Alessandra Cesano20
  1. 1 John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, Nottinghamshire, UK
  2. 2 Centre for Health, Ageing and Understanding Disease (CHAUD), Nottingham Trent University, Nottingham, Nottinghamshire, UK
  3. 3 Roche Pharmaceutical Research and Early Development Oncology, Roche Innovation Center, Penzberg, Germany
  4. 4 Department of Medicine, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  5. 5 Roche Pharmaceutical Research and Early Development Oncology, Roche Innovation Center, Basel, Switzerland
  6. 6 Sage Bionetworks, Seattle, Washington, USA
  7. 7 Roche Pharmaceutical Research and Early Development Oncology, Roche Innovation Center, Zurich, Switzerland
  8. 8 Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts, USA
  9. 9 Department of Medicine, Division of Hematology/ Oncology, University of Pittsburgh School of Medicine and Melanoma Center at UPMC Hillman Cancer Center, Pittsburgh, Pennsylvania, USA
  10. 10 National Institute of Oncology, Budapest, Hungary
  11. 11 Department Oncology-Pathology, Karolinska Institutet, Stockholm, Sweden
  12. 12 CancerProbe Pty Ltd, Prahran, Victoria, Australia
  13. 13 Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
  14. 14 Department of Investigational Cancer Therapeutics, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
  15. 15 Institute for Systems Biology, Seattle, Washington, USA
  16. 16 Department of Medicine, Division of Hematology/Oncology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  17. 17 Allogene Therapeutics, South San Francisco, California, USA
  18. 18 Parker Institute for Cancer Immunotherapy, San Francisco, California, USA
  19. 19 Biolojic Design Inc, Cambridge, Massachusetts, USA
  20. 20 ESSA Pharma Inc, South San Francisco, California, USA
  1. Correspondence to Dr Alessandra Cesano; acesano{at}


The sharing of clinical trial data and biomarker data sets among the scientific community, whether the data originates from pharmaceutical companies or academic institutions, is of critical importance to enable the development of new and improved cancer immunotherapy modalities. Through data sharing, a better understanding of current therapies in terms of their efficacy, safety and biomarker data profiles can be achieved. However, the sharing of these data sets involves a number of stakeholder groups including patients, researchers, private industry, scientific journals and professional societies. Each of these stakeholder groups has differing interests in the use and sharing of clinical trial and biomarker data, and the conflicts caused by these differing interests represent significant obstacles to effective, widespread sharing of data. Thus, the Society for Immunotherapy of Cancer (SITC) Biomarkers Committee convened to identify the current barriers to biomarker data sharing in immuno-oncology (IO) and to help in establishing professional standards for the responsible sharing of clinical trial data. The conclusions of the committee are described in two position papers: Volume I—conceptual challenges and Volume II—practical challenges, the first of which is presented in this manuscript. Additionally, the committee suggests actions by key stakeholders in the field (including organizations and professional societies) as the best path forward, encouraging the cultural shift needed to ensure responsible data sharing in the IO research setting.

  • biomarkers
  • tumor
  • immunotherapy

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The unmet need for data sharing in immuno-oncology

Advances in cancer immunotherapy have significantly improved life expectancy, quality of life and overall survival for some patients with cancer. However, only a relatively small percentage of patients receiving immuno-oncology (IO) treatments currently achieve durable responses and lasting disease-free survival, and the mechanisms and patient attributes underlying immune responsiveness and resistance have yet to be fully understood.1 Given the high costs and potential side effects associated with these therapies, it is crucial to identify the subsets of patients who will clinically benefit from immunotherapy. In addition, understanding the biological basis of mechanisms of resistance to IO treatments is essential for identifying new therapeutic approaches, including the design of rational therapeutic combinations.2

It is now widely accepted that in the field of IO, the conventional single biomarker approach has failed to have a substantial impact in improving treatment efficacy or in informing the next generation of IO therapeutics due to the broad heterogeneity of malignant diseases and limited sizes of patient subgroups providing data.3 4 In addition, the complex and dynamic nature of the tumor microenvironment, together with variability in host genetic background and environmental factors, makes the identification and subsequent validation of biomarkers from single trials a nearly impossible endeavor. Since the determinants of cancer immune responsiveness are multifactorial (including genetic makeup of the patient, tumor genomic instability, epigenetic adaptation, and external modifiers such as the microbiome, concomitant medications, comorbidities, etc), collection and analysis of ‘big data’ sets for biomarker discovery are becoming the norm in the IO clinical research field.5–7 Therefore, converging efforts in biomarker identification and validation using existing datasets and combined patient populations with properly collected clinical and biomarker data (ie, meta-analyses) are greatly needed to guide patient selection, treatment decisions, and advances in new therapy developments in IO.

To date, the development and validation of new biomarkers for IO has been limited by the fragmentation of clinical research efforts, often conducted in a drug-centric way, statistically designed to determine efficacy or safety end points instead of biomarker validation endpoints, and have used investigator/sponsor preferred non-standardized technology platforms, with different levels of analytic validation applied to relatively small data sets (usually with low numbers of responders). The combination of all of these factors results in a lack of intrinsic robustness and underpowered statistical outputs, making most observations hypothesis-generating, at best.

In this context, harmonized clinical and biomarker data sharing can leverage, unify, and maximize sample sizes; enhance biomarker identification, validation, and impact on clinical decision making; and provide a launching pad to accelerate new discoveries. This can be achieved by using existing and completed studies, potentially avoiding future duplicative trials (unless there is a clear mechanistic reason that a drug within an existing class would have better efficacy in a different study population and/or indication). The success and benefit of data sharing will include facilitating the identification and validation of standardized biomarkers through meta-analysis and enabling maximal scientific knowledge and benefits to be gained from the efforts of clinical trial participants and investigators. However, the enabling of this paradigm shift in the field of clinical data sharing and IO biomarker discovery and validation will require a concerted effort from all involved stakeholders.

Current challenges to data sharing can be divided into two major categories:

  • Conceptual Challenges: Key stakeholders may have conflicting interests (both financial and academic) and ethical and regulatory restrictions, which need to be taken into consideration and balanced for data sharing to become the norm in clinical research.8–12

  • Practical Challenges: Effective data sharing requires the improvement and/or standardization of the systems and protocols governing clinical research, including minimal information required for data alignment, standardized statistical approaches, data management infrastructure, technology for data acquisition and analysis, workforce training, and equitable distribution of the costs of research.12–14

The Society for Immunotherapy of Cancer (SITC) Biomarkers Committee formed the Clinical and Biomarkers Data Sharing Subcommittee to discuss challenges from both of these categories. Conceptual challenges and related possible mitigation steps to data sharing in IO are described in detail in this manuscript, while the practical challenges and recommended mitigation strategies are discussed in the companion manuscript ‘The SITC clinical and biomarkers data sharing resource document: Volume II—practical challenges.’ Importantly, these challenges need to be addressed in parallel and in a holistic way in order to make data sharing in IO the standard in clinical research.

An unprecedented example of the power of open data sharing has come from the COVID-19 pandemic. In January 2020, just days after isolation of the SARS-CoV-2 virus, its genome sequence was published and shared with the scientific community.15 Since then, characterization of the virus and associated disease has continued, with additional isolates sequenced and data openly shared, allowing researchers to monitor and track the progression of the pandemic. For example, NextStrain has been tracking and updating information on the progression and changes observed in the viral genome.16 Sequences can be uploaded and shared through the Global Initiative on Sharing all Influenza Data.17 Epidemiology data are being made available by the Johns Hopkins University of Medicine,18 Worldometers,19 the nCoV-2019 data working group,20 Our World in Data,21 and the World Health Organization (WHO)22 dashboards on the coronavirus disease pandemic, which keep ongoing statistics on the spread of the virus, and appraise health officials and the world of the situation, therefore, allowing for critical movement of resources to impacted regions. The WHO has acted to ensure open access to scientific publications, creating a database of 77103 citations on the COVID-19 pandemic and SARS-CoV-2 as of October 7 2020.23 Importantly for the concerns of oncologists, the COVID-19 and Cancer Consortium was rapidly formed to share data on cancer patients with COVID-19 among many institutions.24 25 Additionally, many journals and professional societies, including SITC, have encouraged and enabled open access to articles related to COVID-19 research.26 As a consequence of efforts like those discussed above, the rapid availability of data such as epitope sequences including the binding site of SARS-CoV-2 spike protein to its entry receptor, angiotensin converting enzyme 2 (ACE2), is propelling the generation of diagnostic tests, vaccines, and antibody therapeutics.27 28 Additionally, data sharing during the COVID-19 pandemic has been facilitated through the availability of pre-print services such as medRxiv and bioRxiv; however, the sheer number of preprint articles in COVID-19-related research has prompted calls to parse this non-peer-reviewed literature and grade articles by quality.29 30

Conceptual challenges to data sharing in IO

Ethical considerations: patient privacy and ‘big data’ challenges

Even though most informed consent processes approved by ethics committees have embedded clauses allowing the future nonprofit use of deidentified data, patients’ willingness to share their de-identified data does come with specific requirements in order to protect their privacy, choices, and needs for information regarding the use of their data. Efforts to balance patients’ right to data protection/confidentiality against the need to promote medical science advances through clinical research has been a topic of significant debate in recent years.31 To ensure ethical and regulatory compliance and the sustainability and success of data sharing initiatives in cancer research, appropriate legislation and guidelines need to be implemented. Joint multi-stakeholder agreements and efforts need to be pursued to foster cultural (ie, conceptual) and technological (ie, practical) adaptation or discovery-led changes that will enable patients’ preferences regarding sharing of their own health data to be respected.

In the US, the Health Insurance Portability and Accountability Act (HIPAA) is a federal law that governs the use and disclosure of personal health information by covered entities, defined as health plans, healthcare clearinghouses, and healthcare providers. The general rule is that personal health information cannot be disclosed without the patient’s authorization. Three primary measures protect patient privacy and confidentiality: (1) informed consent, (2) study review and approval by Institutional Review Boards (IRBs) and Research Ethics Committees (RECs), and (3) data use limited to deidentified information.

The intent behind informed consent regulations is to ensure that, before providing their consent, participants are fully aware of the study they are volunteering for, and that they have been given a clear and accurate account of the potential risks and expected benefits of the study. In relation to data sharing, it is important that the informed consent form includes appropriate language explaining the information collected including data generated from integrative genetic/genomic assays on the patients’ specimens and how this data will be stored and used, including the potential for being shared outside the boundaries of the specific study in which they were collected, such as for integrative meta-analyses.

IRBs and RECs are tasked with reviewing, revising, and approving clinical investigation protocols involving humans with the goal of protecting research participants and ensuring that they are treated ethically in the course of their participation in studies.32–34 These committees play close attention to the informed consent process and should provide guidance for investigators and informed consent templates for participants to enable responsible data sharing. In addition, IRBs should influence the responsible sharing of individual participant data by adding considerations for data sharing plans when assessing the benefits and risks of clinical trials in IO. Moreover, consistent policies within and between institutions are needed so that investigators, IRBs, and other institutional officials know what level of protection they can promise to participants and participants can make decisions based on accurate and truthful information.

Data deidentification by removing individually identifiable health information from data sets is another measure aimed to protect patient privacy. However, the increase of research involving ‘big data,’ particularly in IO, may impair efforts to maintain patient privacy. Because big data can contain family history and genetic information, absolute data deidentification can be challenging.35 36 Indeed, one of the main ethical challenges and risks posed to clinical trial participants that has been identified regarding data sharing is the possible reidentification of previously deidentified data. The standard data protections used to anonymize or deidentify patient-representative data were devised prior to current deep sequencing, multiplex technologies, and multiomics platforms that produce higher resolution, more detailed information on patient samples.37

The National Science Foundation funded an initiative to establish the Council for Big Data, Ethics, and Society, which discusses the importance of maintaining ethical standards in the relatively new field of big data analysis. The council has also noted that big data often resides outside of existing systems designed to ensure that research is ethical, and noted several unique issues associated with big data research. These unique issues include: non-physical harm resulting from research (harm by enabling surveillance or discrimination), unpredictable uses of data in the future (beyond the scope of the original study), and the possibility that individually ‘safe’ data sets may be combined to generate data capable of causing harm to participants.38 The council also made a series of recommendations regarding policy, pedagogy, and networking.

Disease advocacy organizations share many of the same concerns, roles, and responsibilities as those of other non-profit funders and clinical trial sponsors with regard to data sharing. Whether through direct financial support (either alone or as part of a funding syndicate) or other forms of assistance (eg, participant recruitment or clinical research networks), disease advocacy organizations make significant contributions to the development and execution of clinical trials. These efforts give these organizations an opportunity to influence policies and strategies to encourage responsible sharing of clinical trial data.39

Data ownership: protection of intellectual property

Historically, the culture surrounding clinical research in industry has not encouraged the proactive sharing of clinical trial data, including biomarker data. There are several reasons for this reluctance. Investors and industry are mostly driven by business models that require the generation of revenue. Therefore, data gathered with industry sponsorship or in support of marketing applications is often considered commercially confidential and competitively advantageous, representing a significant resource investment by the sponsoring organization. As a consequence, new findings that could be competitively advantageous are commonly considered protected intellectual property (IP), which impedes data sharing. Unwillingness to relinquish control of data for fear of misuse and the potential for incidental findings that require regulatory application resubmission have been additional concerns among the pharmaceutical industry.

Over the last 5 years, data sharing within the pharmaceutical industry has evolved from being virtually non-existent to being a landscape in which most companies have participated in one form or another, both for the public good and for the opportunity to uncover new insights and add value to existing data. This has been particularly true during the current COVID-19 pandemic response, and for the field of IO, where access to ‘big’ datasets is crucial. In general, access to, and collaboration with, top scientists and a perceived trustworthiness of academic involvement in data analysis is an important factor in the sharing of clinical trial data by pharmaceutical companies.40

The Foundation for the National Institutes of Health (FNIH) has launched and coordinated the Partnership for Accelerating Cancer Therapies (PACT) program, aimed at building an oncology clinical study database with state-of-the art study design, technology implementation, and pipelines for databases, bioinformatics, and biostatistics, as well as planning for future public open access to the database. This program is truly a partnership effort contributed to by the US government, industry, biotech, biopharma, and academia. It lays the foundation for clinical data sharing in principle and in practice in the future and promotes the sharing of existing data for retrospective exploratory biomarker discovery. However, for the pharmaceutical sector to truly benefit, data collaboration needs to be incorporated into business as usual, rather than remaining the domain of special projects.

Participant motivations: formal recognition and career incentives

In most academic research communities, publications are the primary currency. Promotions, grants, and recruitment policies are often based on publication track-records. The demands and offers of publication outlets, therefore, have an impact on the individual researcher’s data sharing disposition. Because academic success depends on published outputs, and because only a small fraction of the accrued data is published in the primary report of a clinical trial, many academics have been reticent to share data. A 2018 European survey showed that more than 30% of researchers do not routinely share scientific data.41 Fear that competitors may publish novel findings and/or misuse the original data represents an additional hurdle to broad data sharing. This understandable response to a lack of incentives and protection is in line with other situations in the biomedical field, such as the publication of negative data.42 Thus, there is a cultural gap and a lack of governing framework that needs to be addressed by educating the scientific community and by policy/guideline development. Data sharing in academia is a multidimensional effort that includes a diverse set of stakeholders, entities, and individual interests. In the fast moving and fast publishing modern era of biomedical sciences, unless sufficient recognition is paid to the intellectual and physical efforts involved in designing, accruing, and curating comprehensive and usable clinical trial data sets slated for sharing with the greater medical community, individual investigators will continue to work in silos, and forgo the data sharing process that is essential to continued advancement.

Integral steps toward remedying this issue are for governing frameworks to be built with data protection and data sharing in mind, and for clinical investigators to recognize the value of data sharing and the accompanying infrastructure. This could be promoted by inviting collaboration and co-authorship on publications resulting from the use of existing data in new research. In addition, funders can accelerate this process by providing tangible rewards to clinical investigators for data sharing activities. Research institutions and universities should also make sharing of clinical trial data a consideration in promotion of faculty members and assessment of programs. Finally, training for data science and collaboration with quantitative scientists to facilitate sharing and analysis of clinical trial data should also be emphasized.

Journals: impetus and platform for data sharing

Biomedical journals can play a central role in championing and providing recognition for data sharing as well. Currently, the academic h-index of success is calculated based on the set of the scientist’s most cited papers and the number of citations that they have received in other publications.43 Authors actively participating in data sharing should also be scored and recognized accordingly for advancing the field. With fewer restrictions in publication lengths and upper limits on numbers of references resulting from electronic/online journals (in contrast to past hard copy issues), journals should be responsible for ascertaining that all primary data set contributors be properly cited to increase their individual professional scores, enhancing academic or industrial recognition, and thereby sustaining the motivation to share data. More importantly, journals can take the lead in promoting data sharing norms.

While a statement recommending data sharing has been accepted as a step in the right direction by the International Committee of Medical Journal Editors initiative, response was starkly divided, and prominent research funders including the Wellcome Trust, National Institutes of Health (NIH), Medical Research Council (MRC), Cancer Research UK, and Bill & Melinda Gates Foundation declared that the mandates are still vague and open for interpretation.44

Nevertheless, many higher impact journals already mandate data set deposition to repositories (eg, the Gene Expression Omnibus or the database of Genotypes and Phenotypes) at the time of manuscript submission, in addition to mandates on explicit clarity on algorithms used for data analyses reported. Though repositories provide templates and guidelines for correct summary reporting for gene expression and DNA sequencing data set submissions, there is a lack of common clinical data elements from patients. Such a set of common elements would enhance the usability and harmonization of data sets for meta-analyses investigating predictive biomarkers for immunotherapies, and could be established by journal-linked repositories. A notable set of meta-studies demonstrated that journals that require data sharing by authors for peer review and publication typically have significantly higher impact factors, suggesting that journals may also benefit from mandating and streamlining standards (eg, providing guidelines and templates) that ensure sustainable data sharing.45 46

Additionally, preprint servers, including bioRxiv and medRxiv, operated by Cold Spring Harbor Laboratory, could accelerate the pace of science and expedite data sharing with a large audience long before research data are published in peer-reviewed journals, while also ensuring that ideas are not ‘stolen’ and that a record of priority is established. This avenue of data sharing has been used frequently, more recently during the COVID-19 pandemic, motivated by the need to share emerging relevant information in real time with the medical and scientific community. Major research funders, including the MRC, the NIH, and the Howard Hughes Medical Institute, have recently begun to encourage the citation of pre-prints and urge grant applicants to cite pre-prints in their funding proposals. However, it should be made clear that pre-prints reporting the results of clinical trials have not been peer-reviewed, and that they should therefore not be used to guide clinical decision making by other researchers.

Further, to create a responsible medium for biomarker data sharing, societies such as SITC and journals (including the Journal for Immunotherapy of Cancer (JITC)) can initiate and support an independent body that promotes good practices and standardized data stewardship to ensure that biomarker data are fairly and consistently presented; valid based on sample size and methodology; and accessible, interpretable, reusable, and interoperable as per the ‘Findable, Accessible, Interpretable and Reusable’ guiding principles for scientific data management and stewardship.47

Professional societies: hubs for data sharing and biomarker development

At the intersection of improvements in patient outcomes and the advancement of IO research are professional societies. Organizations including SITC, the American Association for Cancer Research (AACR), the American Association of Immunologists (AAI), the American Society of Clinical Oncology (ASCO), the American Society of Hematology (ASH), and the American Society for Transplantation and Cellular Therapy (ASTCT), among several others, represent the collective interests of thousands of physicians, research scientists, advanced practitioners, nurses, biostatisticians, and patient advocates.

With the historical strength of professional societies as a guide, these organizations have great leverage in the pursuit of large-scale collaboration and data sharing. As an example, ASH is creating infrastructure and is designing a process for sharing genomic and clinical data on multiple myeloma called the ASH Research Collaborative (ASH RC), an initiative that could serve as a model for other diseases. The ASH RC Data Hub will also host industry or government datasets and will be able to accommodate data in different formats, including patient-reported instruments and manual chart abstraction.48

Another example, the AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) is a publicly accessible international cancer registry assembled through data sharing.49 The registry links high-quality next-generation cancer genomic sequencing data with clinical outcomes obtained during routine medical practice from nearly every patient with cancer treated at 19 of the leading oncology institutions in the world, including individuals with rare cancers. The AACR Project GENIE therefore potentially provides the statistical power required to inform clinical decision making and to foster novel clinical and translational research.

The leverage possessed by professional societies could also be used to aid in setting standards or in establishing commonplace norms for data sharing within the immunotherapy space. For example, societies can allocate resources to convene workshops and establish committees of experts that can develop standard operating procedures, set standards for research and clinical practice within the field, and publish white papers advocating for particular research priorities or policy positions. For example, the SITC Biomarkers Committee has held workshops and generated outputs aimed at providing a platform for representatives to discuss the advantages and challenges of data sharing from federal, academic, and industrial perspectives.50 The Biomarkers Committee has also drawn attention to the importance of standardized and validated assays, a crucial component of successful data sharing.51–53 Furthermore, the society’s open-access journal, JITC, has also been on the leading edge of important developments in the field since its inception through the publication of original research within its dedicated Immunotherapy Biomarkers section.

Societies may also provide funding for grants or programs that enable or enhance data sharing, both in the clinical and basic and translational research settings. These funding initiatives can play a vital role in unifying ongoing projects and preventing duplication of efforts. One such example was the establishment of a concept called TimIOs that would function as an honest broker for data set amalgamation between biotech companies that was established through SITC’s Sparkathon Program.50 The eventual goal of the project is to identify biomarkers that differentiate between patients with tumors that are likely to respond strongly or weakly to anti-PD-1 therapies.

A more ambitious, but potentially more rewarding proposal would be for societies to initiate or promote the development of an immunotherapy outcomes registry. The proposed registry network of IO data, contributed on a volunteer basis, could be initiated in principle by leveraging the political capital of medical professional societies through policy and advocacy efforts toward a larger national or international initiative. An example that could serve as a model registry program for society-led efforts is the FNIH PACT, which is engaged in efforts on behalf of several academic institutions, biotech and pharmaceutical companies, and the federal government and is involved in standardizing immunotherapy assays with the Cancer Immune Monitoring and Analysis Centers and in housing this data from privately or NIH-funded studies through a cloud based, central data repository as part of the Cancer Immunologic Data Commons Network. However, the FNIH PACT effort does not include a larger mechanism to globally collect biomarker and outcomes data for those oncology patients treated with immuno-therapeutics in ‘real-world’ settings,50 which is an area of need that societies could potentially address with their own registry programs.

Professional society initiatives aimed at clinical data sharing have proven successful. In one of the most prominent examples, the world of hematopoietic stem cell transplantation offers guidance. In 1970, the International Bone Marrow Transplant Registry (IBMTR) was founded as a division of the American College of Surgeons/National Institutes of Health Organ Transplant Registry.54 By 1976, with the dissolution of the American College of Surgeons Organ Transplant Registry, the IBMTR became a standalone agency based in Milwaukee, Wisconsin and supported by the US Department of Health, Education and Welfare. In the years since that time the IBMTR merged with the National Bone Marrow Donor Registry in 2004 and formed one comprehensive program, the Center for International Blood & Marrow Transplant Research, which has continued to accrue invaluable data related to hematopoietic stem cell transplantation.55 These data have played a pivotal role in shaping the development of hematopoietic stem cell transplants.

Another example of a federal-level program that could serve as an effective model for future society programs is The Cancer Genome Atlas (TCGA), a successful data sharing initiative that has generated and compiled comprehensive genomic, transcriptomic, epigenomic, and proteomic data from bulk tumor samples across 33 human malignancies.56 57 Through this approach and collaborative effort, six immune subtypes of solid tumors of prognostic significance have been identified.56 All data and results are intended to serve as a resource for future studies and can be freely accessed at the National Cancer Institute Genomic Data Commons, which houses data not only from TCGA but from several other cancer genomic projects, and through the Cancer Research Institute iAtlas portal, which provides interactive visualization of immune response in tumors. An in-depth atlas of the immune microenvironment of human tumors has been developed in clear cell renal cell carcinoma and early lung cancer, representing an invaluable tool for the rational design of targeted therapies as well as combination immunotherapy clinical trials.58 59 Additionally, the Human Protein Atlas is another example of successful data sharing. The program was initiated in 2003 in Sweden and aims to map all the human proteins in cells, tissues and organs using integrative-omics technologies. The online platform provides access to the distribution of the proteins across tissues, organs, the subcellular localization of proteins, and potential prognostic impacts of protein expression on cancer patient survival.


In summary, IO data sharing is an idea that is not only beneficial in concept, but one that promises to be a powerful tool in combating cancer once made routine. Joint efforts from diverse stakeholders described in this paper are required to champion, adopt, and foster a culture of data sharing with realistic goals. Data sharing also requires entities that can provide management and oversight to ensure the protection of patient privacy, IP for funding organizations, and primary publication rights. These entities should also work to ensure the quality, integrity, and completeness of shared information, and that protocols governing data access and reuse are ethical, fair, and transparent. Data sharing will help break down silos that can inadvertently lead to the duplication of work and wasted research dollars. It is not only a possibility, but only a matter of time until the cycle of data silo-ing will be broken. In the future, clinical study data will not be boxed away after the initial study-focused analysis, but reused to accelerate the pace of discovery in the field. As we take stock of those stakeholders who hold the potential to shift our culture of competition into one of collaboration, we see the role that professional societies such as SITC have played, and can continue to act as champions for such change.

Worldwide, research policy makers support the accessibility of research data. This can be seen in the US with efforts by the NIH and in Europe, with the European Union’s Horizon 2020 Program.60–63 In order to develop consequential policies for data sharing, policy-makers need to understand and address the perspectives of all involved parties. We hope that the framework that we present in this paper will encourage a better understanding of the prevailing issues and provide insights into the underlying dynamics of academic and Pharma data sharing.


The authors would like to dedicate this manuscript to the memory of Beatrix Kotlan, who served on the SITC Biomarkers Committee and made significant contributions in authoring this work. The authors acknowledge SITC staff for their contributions, including Ben Labbe, PhD for medical writing and editorial support, and Lionel Lim; Sam Million-Weaver, PhD; and Angela Kilbert for project management and assistance. Additionally, the authors wish to thank the society for supporting the development of the manuscript.



  • Twitter @AnaingMD, @SITCancer

  • Deceased Dr. Kotlan passed away on April 29th, 2020.

  • Contributors AC and SR, the Chairs of the SITC Clinical and Biomarkers Data Sharing Subcommittee, provided guidance on the manuscript structure and content as well as leadership of the manuscript development group. SG, BK, MK, AM, NT and EW served as section leads for the manuscript development teams. All authors actively contributed to the manuscript development through providing content, critically reviewing drafts, and advising on additions and changes throughout the process. All authors have read and approved the final version of this manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests AC—Employee: ESSA Pharma; Consulting fees: Refuge Bio, Arch Oncology, Qognit, Nanostring. BG—Employee: Hoffmann La Roche; Stockholder: Roche. MAC—Employee: Roche Diagnostic GMBH; Stakeholder: Roche; Patent: 10878NDR. SG—Consultancy/Advisory Fees: Merck, Neon Therapeutics, OncoMed; Research Funding: Agenus, Bristol-Myers Squibb, Genentech, Immune Design, Janssen R&D, Pfizer, Regeneron, Takeda. JMK—Grant Funding: Prometheus, Merck; Personal Fees: Array Biopharma, Bristol-Myers Squibb, Novartis, Roche; Grants and Personal Fees: Immunocore. EM—Director/Shareholder: CancerProbe Pty Ltd. VK—Employee: Hoffmann La Roche; Stockholder: Roche; Patent: EP3221355A1. AN—Consulting Fees: CytomX Therapeutics, Novartis, Kymab, Genome; Contracted Research: National Cancer Institute, EMD Serono, MedImmune, Healios Onc. Nutrition, Atterocor, Amplimmune, ARMO Biosciences, Eli Lilly, Karyopharm Therapeutics, Incyte, Novartis, Regeneron, Merck, Bristol-Myers Squibb, Pfizer, CytomX Therapeutics, Neon Therapeutics, Calithera Biosciences, TopAlliance Biosciences, Kymab, PsiOxus; Travel Accommodation: ARMO Biosciences; Partner Contracted Research: Immune Deficiency Foundation. TLW—Employee/Stockholder: Biolojic Design. DKW is a scientific co-founder, equity holder and paid advisor to Immunai. SR, JG, BK, MK, AM, GVM, VT, NT, and EW have nothing to disclose. SITC Staff: AK, BL, LL, and SMW have nothing to disclose.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.