Table 2

Online resources: tools for the bench and other useful websites

ResourceDescriptionURL link
CIMAC/CIDC networkThe Cancer Immune Monitoring and Analysis Centers (CIMAC) and the Cancer Immunologic Data Commons (CIDC) are NCI-funded academic centers for advanced clinical trial immune monitoring. https://cimac-network.org/
PACTThe Partnership for Accelerating Cancer Therapies (PACT) is a public–private collaboration that extends the CIMAC/CIDC activities to include additional non-NCI clinical trials. https://fnih.org/what-we-do/programs/partnership-for-accelerating-cancer-therapies
Links to FDA biomarker approvalThe FDA’s Center for Drug Evaluation and Research works with stakeholders to identify and develop new biomarkers, review biomarkers for use in regulatory decision-making, and qualify biomarkers for specific contexts of use. https://www.fda.gov/drugs/drug-development-tool-qualification-programs/cder-biomarker-qualification-program
Public databasesImmPort is a data repository and sharing tool built by NIAID for immunology-related assay data of various types. http://www.immport.org
The Cancer Genome Atlas is a database of sequences from over 20,000 cancer and matched normal tissues. https://portal.gdc.cancer.gov
Transcription factors binding sites prediction softwareTranscription factor (TF) binding site prediction is very important in deciphering gene regulation at a transcriptional level. TF binding sites are typically identified by either matching to a consensus sequence or using position-specific scoring matrices (PSSMs). PSSMs can be obtained from resources including the commercial transcription factor database (TRANSFAC) and the open access database JASPAR:
  • Computer methods to locate signals in nucleic acid sequences.584

  • TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.585

  • JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.586


In 2005, Tompa M et al 587 evaluated 13 algorithms designed to identify cis-regulatory sites using TF binding sites from TRANSFAC. Their results revealed that the Weeder algorithm performed best:
  • Assessing computational tools for the discovery of transcription factor binding sites.587


A set of de novo motif discovery tools, namely rGADEM (R-based genetic algorithm-guided formation of spaced dyads coupled with an expectation-maximization (EM) algorithm for motif discovery), HOMER (hypergeometric optimization of motif enrichment), MEME-ChIP (multiple EM for motif elicitation-chromatin immunoprecipitation), and ChIPMunk (a modification of the classical EM approach), were also evaluated using ChIP-seq data ENCODE. The study showed that rGADEM was the best-performing tool for creating PSSMs from high-throughput ChIP-seq data. FIMO (Find Individual Motif Occurrences) and MCAST (Motif Cluster Alignment and Search Tool) were the best-performing TF binding site prediction tools for scanning PSSMs against DNA:
  • Evaluating tools for transcription factor binding site prediction.588

Tools for neoantigen predictionNeoantigens are small peptides derived from mutated proteins in cancer cells that can be recognized as foreign by immune cells and trigger an immune response. There are many challenges in computational methods/tools to identify neoantigens and to predict which may serve as optimal targets for the development of immunotherapy approaches:
  • Neoantigens in cancer immunotherapy589

  • Computational genomics tools for dissecting tumour-immune cell interactions.590

  • Applications of immunogenomics to cancer.490


MHC binding has been considered a necessary step for neoantigens to be recognized by T cell receptors. The MHC binding prediction methods can be categorized as binding motif-based, position-specific score-based or matrix-based, and machine learning-based, such as artificial neural networks (ANN) or support vector machines. Because of the polymorphic nature of MHC class II molecules and variations in accepted peptide length, the prediction results for MHC class II binding are less accurate than those for MHC class I. Many existing MHC binding peptide and T cell epitope databases could potentially serve as a training data pool to develop prediction models. A good example is the Immune Epitope Database (IEDB), which provides a comprehensive resource for experimental data on antibody and T cell epitopes studied in multiple diseases:
  • SYFPEITHI: database for MHC ligands and peptide motifs.591

  • Profile analysis: detection of distantly related proteins.592

  • Gapped sequence alignment using artificial neural networks: application to the MHC class I system.593

  • NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.594

  • NetMHCpan, a method for MHC class I binding prediction beyond humans.595

  • Application of support vector machines for T-cell epitopes prediction.596

  • SVMHC: a server for prediction of MHC-binding peptides.597

  • The immune epitope database and analysis resource: from vision to blueprint.598

  • The immune epitope database (IEDB) 3.0.599

  • IEDB: http://tools.iedb.org/main/datasets 600


Not all MHC binding peptides are immunogenic. Combination approaches have been developed to use additional information (eg, proteasome cleavage) in order to reduce the false positive rate. Since the stability of the peptide–MHC interaction has experimentally been shown to be more strongly correlated to T cell immunogenicity, netMHCstabpan (pan-specific prediction of peptide–MHC class I complex stability) uses a neural network approach based on a data set of stability values calculated for different peptide–MHC class I complexes, rather than their binding affinity values:
  • Pan-specific prediction of peptide-MHC class I complex stability, a correlate of T cell immunogenicity.601


Many pipelines have been developed for neoantigen prediction from WES sequencing data via integration of multiple methods. For example, MuPeXI (mutant peptide extractor and informer) is a program to identify tumor-specific peptides from sequencing data and assess their potential to be neoantigens. The peptides are sorted according to a priority score which is intended to roughly predict immunogenicity. A flexible, streamlined computational workflow for identification of personalized Variant Antigens by Cancer Sequencing (pVACSeq) integrates tumor mutation and expression data:
  • MuPeXI: prediction of neo-epitopes from tumor sequencing data.602

  • pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens.603

IEDB: http://tools.iedb.org/main/datasets
CTRs
  • ANN, artificial neural networks; CIDC, Cancer Immunologic Data Commons; CIMAC, Cancer Immune Monitoring and Analysis Centers; CTR, clinical trial registry; EM, expectation maximization; FDA, Food and Drug Administration; FIMO, Find Individual Motif Occurrences; HOMER, hypergeometric optimization of motif enrichment; IEDB, Immune Epitope Database; MCAST, Motif Cluster Alignment and Search Tool; MHC, major histocompatibility complex; MuPeXI, mutant peptide extractor and informer; NCI, National Cancer Institute; NIAID, National Institute of Allergy and Infectious Diseases; PACT, Partnership for Accelerating Cancer Therapies; PSSM, position-specific scoring matrix; pVACSeq, peronsalized Variant Antigens by Cancer Sequencing; rGADEM, R-based genetic algorithm-guided formation of spaced dyads coupled with an EM algorithm for motif discovery; TF, transcription factor; WES, whole exome sequencing.