Resource | Description | URL link |
CIMAC/CIDC network | The Cancer Immune Monitoring and Analysis Centers (CIMAC) and the Cancer Immunologic Data Commons (CIDC) are NCI-funded academic centers for advanced clinical trial immune monitoring. | https://cimac-network.org/ |
PACT | The Partnership for Accelerating Cancer Therapies (PACT) is a public–private collaboration that extends the CIMAC/CIDC activities to include additional non-NCI clinical trials. | https://fnih.org/what-we-do/programs/partnership-for-accelerating-cancer-therapies |
Links to FDA biomarker approval | The FDA’s Center for Drug Evaluation and Research works with stakeholders to identify and develop new biomarkers, review biomarkers for use in regulatory decision-making, and qualify biomarkers for specific contexts of use. | https://www.fda.gov/drugs/drug-development-tool-qualification-programs/cder-biomarker-qualification-program |
Public databases | ImmPort is a data repository and sharing tool built by NIAID for immunology-related assay data of various types. | http://www.immport.org |
The Cancer Genome Atlas is a database of sequences from over 20,000 cancer and matched normal tissues. | https://portal.gdc.cancer.gov | |
Transcription factors binding sites prediction software | Transcription factor (TF) binding site prediction is very important in deciphering gene regulation at a transcriptional level. TF binding sites are typically identified by either matching to a consensus sequence or using position-specific scoring matrices (PSSMs). PSSMs can be obtained from resources including the commercial transcription factor database (TRANSFAC) and the open access database JASPAR:
In 2005, Tompa M et al 587 evaluated 13 algorithms designed to identify cis-regulatory sites using TF binding sites from TRANSFAC. Their results revealed that the Weeder algorithm performed best:
A set of de novo motif discovery tools, namely rGADEM (R-based genetic algorithm-guided formation of spaced dyads coupled with an expectation-maximization (EM) algorithm for motif discovery), HOMER (hypergeometric optimization of motif enrichment), MEME-ChIP (multiple EM for motif elicitation-chromatin immunoprecipitation), and ChIPMunk (a modification of the classical EM approach), were also evaluated using ChIP-seq data ENCODE. The study showed that rGADEM was the best-performing tool for creating PSSMs from high-throughput ChIP-seq data. FIMO (Find Individual Motif Occurrences) and MCAST (Motif Cluster Alignment and Search Tool) were the best-performing TF binding site prediction tools for scanning PSSMs against DNA:
| |
Tools for neoantigen prediction | Neoantigens are small peptides derived from mutated proteins in cancer cells that can be recognized as foreign by immune cells and trigger an immune response. There are many challenges in computational methods/tools to identify neoantigens and to predict which may serve as optimal targets for the development of immunotherapy approaches:
MHC binding has been considered a necessary step for neoantigens to be recognized by T cell receptors. The MHC binding prediction methods can be categorized as binding motif-based, position-specific score-based or matrix-based, and machine learning-based, such as artificial neural networks (ANN) or support vector machines. Because of the polymorphic nature of MHC class II molecules and variations in accepted peptide length, the prediction results for MHC class II binding are less accurate than those for MHC class I. Many existing MHC binding peptide and T cell epitope databases could potentially serve as a training data pool to develop prediction models. A good example is the Immune Epitope Database (IEDB), which provides a comprehensive resource for experimental data on antibody and T cell epitopes studied in multiple diseases:
Not all MHC binding peptides are immunogenic. Combination approaches have been developed to use additional information (eg, proteasome cleavage) in order to reduce the false positive rate. Since the stability of the peptide–MHC interaction has experimentally been shown to be more strongly correlated to T cell immunogenicity, netMHCstabpan (pan-specific prediction of peptide–MHC class I complex stability) uses a neural network approach based on a data set of stability values calculated for different peptide–MHC class I complexes, rather than their binding affinity values:
Many pipelines have been developed for neoantigen prediction from WES sequencing data via integration of multiple methods. For example, MuPeXI (mutant peptide extractor and informer) is a program to identify tumor-specific peptides from sequencing data and assess their potential to be neoantigens. The peptides are sorted according to a priority score which is intended to roughly predict immunogenicity. A flexible, streamlined computational workflow for identification of personalized Variant Antigens by Cancer Sequencing (pVACSeq) integrates tumor mutation and expression data:
| IEDB: http://tools.iedb.org/main/datasets |
CTRs |
ANN, artificial neural networks; CIDC, Cancer Immunologic Data Commons; CIMAC, Cancer Immune Monitoring and Analysis Centers; CTR, clinical trial registry; EM, expectation maximization; FDA, Food and Drug Administration; FIMO, Find Individual Motif Occurrences; HOMER, hypergeometric optimization of motif enrichment; IEDB, Immune Epitope Database; MCAST, Motif Cluster Alignment and Search Tool; MHC, major histocompatibility complex; MuPeXI, mutant peptide extractor and informer; NCI, National Cancer Institute; NIAID, National Institute of Allergy and Infectious Diseases; PACT, Partnership for Accelerating Cancer Therapies; PSSM, position-specific scoring matrix; pVACSeq, peronsalized Variant Antigens by Cancer Sequencing; rGADEM, R-based genetic algorithm-guided formation of spaced dyads coupled with an EM algorithm for motif discovery; TF, transcription factor; WES, whole exome sequencing.