Treatment with checkpoint inhibitors in a metastatic colorectal cancer patient with molecular and immunohistochemical heterogeneity in MSI/dMMR status

Background Analysis of deficiency in DNA mismatch repair (dMMR) is currently considered a standard molecular test in all patients with colorectal cancer (CRC) for its implications in screening, prognosis and prediction of benefit from immune checkpoint inhibitors. While the molecular heterogeneity of CRC has been extensively studied in recent years, specific data on dMMR status are lacking, and its clinical consequences are unknown. Case presentation We report the case of a metastatic CRC (mCRC) patient with immunohistochemical and molecular heterogeneity in dMMR/microsatellite instability status in the primary tumour. The patient was treated with nivolumab plus ipilimumab and achieved a deep and lasting response with clear clinical benefit. Whole-exome sequencing and RNA-seq data are reported to support the evidence for molecular heterogeneity. Re-biopsy at the time of progression ruled out the selection of MMR proficient clones as an escape mechanism. A large single-institution retrospective dataset was interrogated to further explore the real incidence of heterogeneity in its different presentations. Conclusions The present case supports the efficacy of immune checkpoint inhibition in mCRC with heterogeneity in MMR/microsatellite instability status. Clinical issues that may arise in these rare patients are discussed in detail.


Molecular Oncology Findings
About this analysis: Alterations to known oncogenes, known tumor suppressors, and/or potentially treatable genes are gathered by this analysis. Small somatic variants presented here must be classified as "Pathogenic" or "Likely Pathogenic" according to the method described in the Additional Mutation Analysis section below. Treatable genes are those identified in a targeted treatment above and are not necessarily known oncogenes or tumor suppressors. See Methods section for more details.
A total of 57,178 somatic variants were identified in this patient's tumor, including 119 non-synonymous variants, for an estimated exonic mutation rate of 5.2 mutations per megabase (Mb).
Tumor sample's exonic mutation rate (red line) compared to 5 Of the 102 oncogenes and 137 tumor suppressors analyzed here, 3 small variants (predicted to be Pathogenic, Likely Pathogenic, and/or treatable) and 7 copy number alterations were detected. These variants are presented in the tables below, with oncogenes labeled in RED and tumor suppressors labeled in BLUE. Protein Analysis

Genes Associated with Chemotherapy Response
About this analysis: Expression levels for 8 genes associated with differential chemotherapy response are presented here. An empirically-derived cutoff for each gene is used to determine if its expression level is High, which may indicate the tumor's sensitivity (or resistance) to the associated drugs. See Methods section for more details.
The expression data for genes associated with chemotherapy response are summarized in the tables below.

Additional Mutation Analysis
About this analysis: The pathogenicity of somatic small mutations is predicted using a heuristics-based procedure oriented towards cancer. Information at the genelevel (oncogene or tumor suppressor, driver of tumorigenesis, etc.) and information specific to the mutation (activating or disruptive, conservation score, location within recurrently-mutated hotspot, etc.) is utilized. While all somatic small variants are analyzed by this procedure, variants to Known Oncogenes and Tumor Suppressors classified as "Pathogenic" or "Likely Pathogenic" are not presented in detail here, as they are reported in Molecular Onocology Findings analysis above. See Methods section for more details.
All somatic variants identified in this patient sample were categorized according to the each variant's annotation and supporting data as described in the Methods. A summary of mutations broken down by variant class and pathogenicity category is provided below. Please note that mutations listed in the Molecular Oncology Findings section are included in the summary table below.

Copy number analysis
About this analysis: A survey of relative coverage and allelic imbalances is performed across the whole genome to identify regions exhibiting altered copy number.
Only regions harboring at least one gene of interested are reported, up to a maximum of 10 regions with the highest and lowest relative coverage estimates. Reported regions must be at least 10kb in size. See Methods section for more details.
A total of 15,058 regions were identified according to the methods described below. The tables below list up to 10 regions of the tumor genome that harbor at least one gene of interest with the highest and lowest copy number copy number estimates. Copy number was computed by converting relative coverage estimates with a purity estimate of 78% and a ploidy estimate of 1.8n. Based on an analysis of patient's germline data, a total of 8 variants implicated in altered toxicity of anti-cancer drugs were present in 5 genes.
Toxicity related findings linked to variants detected in the patient. No somatic variants detected in patient.

Status of HRD-Related Genes
About this analysis: In tumors with large contributions of the BRCA1/2 mutational signatures, this analysis attempts to find genetic factors that may explain homologous recombination defiency (HRD), such as disruptive somatic and/or germline variants, copy number alterations, and expression status of genes related to HRD (BRCA1, BRCA2, PALB2, RAD51C). See Methods section for more details.
Fewer than 5% of somatic mutations in the tumor are attributed to the BRCA1/2 signature (0.0%), so it is unlikely that homologous recombination is deficient in this tumor. The patient's sequencing data were analyzed for alterations to genes related to homologous recombination deficiency. The summary of findings is provided in the table below.   The patient's breast cancer subtype was predicted to be Her2+. This overall subtype prediction and the probabilities of the sample being associated with each of the 5 major breast cancer subtypes are provided in the table below.

Status of HRD-related genes
Intrinsic breast cancer subtype prediction Note: The reported probabilities will always sum to 100 even in settings where prediction of breast cancer subtype is inappropriate, such as applying this model to non-breast cancers.

Colorectal Consensus Molecular Subtype (CMStype)
This patient's CRC CMS subtype is predicted to be CMS2 (Canonical). This subtype is associated with epithelial, chromosomally unstable, marked WNT and MYC signaling activation, and is found in ∼37% of the CRC population.

Immune Cell Status
About this analysis: The average expression levels of 23 immune cell signatures are computed using RNA sequencing data. The inferred state of each signature is derived using a background distribution of samples within similar ICD10 categories (when an ICD10 code for the sample is available). Elevated expression of immune cells in the tumor biopsy may suggest an immunotherapy approach could be effective. See Methods section for more details.
Since this patient's tumor does not have an assignable ICD10 code, the inferred immune cell infiltration data presented in the table below is based on a background distribution of all observed unassignable samples.

Detection of Viral Sequence
No viral sequences were detected in the patient's sequencing data.

Complex Phenomena
The following complex phenomenon was detected: kataegis. The figure below indicates in red the locations in the genome where these phenomena were detected.

Mutational Signatures
About this analysis: Using non-negative matrix factorization (NMF) on counts of mutated triplets detected in the tumor sample, the contributions of 30 mutational signatures characterized by the Sanger Institute are estimated. These mutational signatures can identify defects in DNA left by exposure to carcinogens (e.g. tobacco smoke), disfunction of DNA repair mechanisms (e.g. BRCA1/2 or MMR genes), and/or the activity of a variety of mutational processes (e.g. APOBEC / AID). See Methods section for more details.
In this sample, a total of 29,016 single nucleotide variants were identified. A figure plotting the distribution of mutations according to their genomic contexts, can be found in the appendix. The table below shows the 10 mutational signatures believed to be active in this patient.
Active mutational signatures identified in this sample. Based on the high percentage of mutations attributed to the APOBEC/AID mutational signatures (5.2%), activity of the APOBEC/AID family is likely elevated in this tumor.
The patient's sequencing data were analyzed for multiple genetic factors known to be associated with increased activity of the APOBEC/AID family of cytidine deaminases. The results of the analysis are provided in the table below.
Genetic factors associated with APOBEC/AID mutational signatures * Fusion is composed of UTR sequence from the upstream gene and a new translation initiation site was predicted to produce a functional fusion gene. Without an ability to confirm the actual protein composition, the predicted fusion is described as p.0? (probably no protein) and p.? (unknown protein) for the upstream and downstream gene contributions, respectively. * The spanning read supports are provided in the format ''T:X, N:Y'' where T and N refer to the tumor and matched-normal DNA sequencing datasets, respectively.

Junction Analysis
About this analysis: Read support for junction sequences specific to 3 clinically important gene fusions and alternatively spliced genes is quantified from RNA sequencing data. Junctions supported by the sequencing data are reported here along with their potential therapeutic implications. See Methods section for more details.
No support for the 3 junction sequences analyzed here were discovered in the patient's sequencing data.

Structural variants
Out of the 131 structural variants identified that overlap at least one gene, 2 of these structural variants affect at least one cancer gene. Up to 10 of the top ranked structural variants are listed in the table below. The full list of structural variants can be found in the supplemental files.   Small variants were annotated with base-level PhastCons conservation scores, population allele frequencies from dbSNP (Build 142), and for their predicted impact to genes. Each small variant predicted to alter the protein sequence of a gene is further analyzed by a proprietary de novo assembly algorithm that realigns all reads surrounding the variant from both tumor and matched-normal samples to increase confidence that the detected somatic or germline variant is real.
RNA-Seq libraries were prepared for the tumor sample using KAPA Stranded RNA-Seq with RiboErase kit and sequenced on the Illumina sequencing platform. RNA sequencing reads were aligned by bowtie2 using default parameters to the RefSeq transcriptome and analyzed by RSEM. Transcriptome-based alignments are converted to reference genome coordinates to detect expression of variants identified in the DNA sequencing data.
Normalized gene-level TPMs are used to determine if the gene is High or Over-Expressed based on cutoffs derived from orthogonal technologies and/or datasets. Four techniques for establishing cutoffs were used, including optimizing TPM concordance for preset levels, identifying cutoffs that maximize concordance between platforms, selecting cutoffs that maximize positive predictive value (PPV), or utilization of alternative datasets with outcomes or phenotypes of interest. Orthogonal testing techniques included proteomics selected reaction monitoring (SRM) technology and alternative RNA-Seq library preparations (poly-A).
Note that the minimum tumor purities listed in the table below refer to purities measured post-microdissection. Tumor specimens with lower purities are often acceptable provided that microdissection can enhance the purity to exceed the purities specified below. Findings are curated from multiple sources, such as primary literature and FDA drug labels. When genomic and/or transcriptomic biomarkers associated with these findings are detected in the sequencing data of the patient (somatic or germline), the finding is summarized by this analysis.

Molecular Oncology Findings Version 2.1
Variants occuring in 239 known cancer genes and those deemed treatable in other analyses of this report are reported here. Known cancer genes are classified as tumor suppressors or oncogenes using data available from COSMIC Cancer Gene Census [1].
Only small variants predicted to be Pathogenic or Likely Pathogenic are presented here, using the heuristic procedure described in Additional Mutational Analysis. All variants occuring in treatable genes will be reported.
Mutational burden is classified as HIGH in tumors with 200 or more non-synonymous mutations, which is associated with clinical benefit of anti-PD-1 therapy [2]. Tumors with fewer than 200 non-synonymous mutations are classified with LOW mutational burden. An additional 4 protein markers are measured by the LungAdenoPlex assay for lung cancers: p63, K7, K5, TTF1. When available, p16 protein expression is measured to indicate potential HPV infection in head & neck, cervical, anal, and rectal cancers, and KRAS protein expression is measured to predict poor prognosis in gastroesophageal and endometrial cancers.

Genes Associated with Chemotherapy Response Version 1.3
Expression levels for the following 8 genes associated with differential response to chemotherapy are reported here: TOP1 (TOPO1), TYMP, SLC29A1 (hENT1), FOLR1 (FR-alpha), TOP2A (TOPO2A), RRM1, TUBB3, MGMT. For each gene, the empirically-derived cutoff and median expression level within previously-assayed clinical samples are provided. Expression status is classified as High if the sample's TPM is higher than its associated cutoff. The TPM cutoffs were optimized to best approximate the following proteomics-based cutoffs: TOPO1 = 2,075, TYMP = 2,600, hENT1 = 338, FR-alpha = 1,300, TOPO2A = 1,570, RRM1 = 390, TUBB3 = 1,000, and MGMT = 200 amol/µg. This approximation to proteomics is presented so that comparison of previously-tested samples may be performed with the caveat that new samples will have inherent heterogeneity.

Additional Mutation Analysis Version 2.1
All somatic variants are classified into the following 5 categories: "Pathogenic", "Likely Pathogenic", "Variant of Unknown Significance", "Likely Benign", and "Benign". The variant's category is determined using a combination of variant class (e.g. Missense), amino acid change, Phast-Cons conservation score of the mutated site, gene type (i.e. Oncogene, Tumor Suppressor, or neither), driver status (e.g. Driver Gene), variant allele frequency in the population from dbSNP, and if the variant is located inside of a gene's mutational hotspot.
The disruption of a particular amino acid change is calculated according to a Conservationcontrolled Amino acid Substitution Matrix (CASM) score [11], with parameters estimated using Five3 variant calls on >5,000 TCGA tumor exomes and their matched-normals. PhastCons was downloaded from the UCSC Genome Browser. Gene type is obtained using data from the COSMIC Cancer Gene Census [1]. Driver status is obtained from a pan-cancer publication across 15+ TCGA cancer types [12]. Clusters of mutations were discovered using OncodriveCLUST on the variant calls made on >5,000 TCGA tumor exomes [13]. Variants found in the COSMIC database (release v76) are annotated with the number of COSMIC samples harboring mutations that cause the same protein change [1].
Mutation clonality is determined using the purity and ploidy estimates produced by Cytogenetic Analysis, when available. These estimates are used to transform the local relative coverage of the copy number segment harboring the mutation into somatic copy number, which is used to determine the posterior probability that at least 75% of tumor cells harbor mutation [14]. A mutation is deemed clonal if its posterior probability exceeds 0.75, subclonal if less than 0.25, and otherwise undetermined.
Design of the figure comparing this tumor sample's exonic mutation rate to mutation rates of tumors sequenced by TCGA is attributed to Gad Getz and his colleagues at the Broad Institute. Raw genome-aligned RNA sequencing data is scanned for support of single nucleotide variants in the COSMIC database (build v55) in known oncogene and tumor suppressors that lack support in the patient's DNA sequencing data. To reduce false positives from such an approach, the support in the RNA sequencing data must be significant, with at least 4 unique reads, representing 25.00% of reads. Version 1.6 DNA sequencing data from both tumor and normal tissues are scanned for relevant germline variants (i.e. nonsense SNVs and frame-shifting insertions & deletions) in the following 22 genes implicated in increased risk of developing cancer: APC, BMPR1A, BRCA1, BRCA2, MEN1, MLH1, MSH2, MSH6, NF2, PMS2, PTEN, RB1, SDHB, SDHC, SDHD, SMAD4, STK11, TP53, TSC1, TSC2, VHL, WT1, which are included in ACMG's recommendations for reporting incidental findings [15]. Variants must be sequenced to a minimum depth of 10 reads and have a minimum alternate allele fraction of 0.25 in the normal sequencing data to be reported.

Secondary Screening for Cancer Predisposition
Known pathogenic variants found in BRCA1/2 are identified using a database of clinically important BRCA1/2 germline variants extracted from NHGRI's Breast Cancer Information Core (BIC) [16].
Microsatellite Instability Version 1.2 Instability of microsatellite repeats is estimated using the method described here [17]. A set of 2,848 microsatellites consisting of homopolymer repeats were analyzed for a statistically significant increase in the number of length polymorphisms in both tumor and matched-normal (if available) sequenced. The background mean (µ) and standard deviation (σ) of the number of length polymorphisms for each microsatellite locus were computed across approximately 5,000 blood and solid normal exomes sequenced by TCGA comprising 18 different cancer types. Loci covered by fewer than 30 reads are excluded from the analysis. For each microsatellite locus, the number of differently-sized repeats are counted for each sample. Repeats with read support exceeding 5% of the read support of the maximally-supported repeat are tallied for a total count of differentlysized repeats, n. The total number of unstable microsatellites is counted in each sample, where a given microsatellite i is deemed unstable if ni > µi + 3σi. The percentage of unstable loci is calculated for the tumor and matched-normal. The differential is then determined by subtracting the percentage of unstable loci in the normal sample from the percentage of unstable loci calculated in the tumor. A tumor is considered to demonstrate microsatellite instability (MSI) when the differential exceeds the threshold specified in the results. Copy number analysis Version 1.5 Relative coverage and majority allele fraction of the tumor sample versus its matched normal were estimated using a single-pass segmentation algorithm that merges fixed-width contiguous regions of the genome unless the estimates of the relative coverage and majority allele fraction (when available) of the regions differ in a statistically significant manner (i.e. greater than 3 standard deviations). The regions outputted by the singlepass segmentation algorithm are corrected for estimated GC bias. Variable regions with the weakest support are merged with the neighboring region that best matches the region's estimates, and then the newly neighboring regions are merged using the same significance criteria as before. This last step is iteratively performed until regions can no longer be merged together.
Copy number status for a given region is de- are not plotted unless the segment harbors at least one cancer gene and is among the top 10 segments with highest or lowest relative coverage.
Copy number is computed from relative coverage estimates using the purity and ploidy estimates produced by Cytogenetic Analysis, when available.
Oncogenes that experience gains in copy number (''Moderate Amp.'' or '' Amplification'') and tumor suppressor genes that experience losses are highlighted as significant findings if the genes are deemed as ''High Confidence Drivers'' by a study of the mutations detected in 3,000+ exomes sequenced by TCGA [12]. A set of genetic factors attributed to homologous recombination deficiency (HRD) are scanned for in the patient's sequencing data. These factors include: germline & somatic nonsense and frameshifting variants, copy number alterations, and expression levels in genes related to HRD: BRCA1, BRCA2, PALB2, RAD51C. BRCA1/2 mutation signature is predicted as active if more than 5% of somatic mutations are attributed to its mutational signature.
Cytogenetic Analysis Version 1.12 This analysis uses the copy number segments described in Copy Number Analysis to estimate the amount of normal contamination, α, and tumor ploidy that can be used to transform the relative coverage estimates into allele-specific, integral copy number states where possible.
Best fit parameters for α and tumor ploidy are found by gradient descent. Each round of gradient descent is initialized with random values for α and ploidy and attempts to maximize the joint loglikelihood of the relative coverage, rc, and majority allele fraction, af , estimates weighted according to segment size across the 22 autosomes. Gradient descent is performed in this manner for a minimum of 10 times, and the best fit parameters across all rounds are reported.
In the joint log-likelihood calculation, a set of common allelic states are used to determine the expected relative coverage and majority allele fraction values for each state, given α and tumor ploidy. These states include commonly altered states such as single copy gain (2, 1), lossof-heterozygosity or LOH (1, 0), and copy-neutral loss-of-heterozygosity or CN-LOH (2, 0), where the numbers in parentheses are the majority and minority allelic copy numbers, (A, B), that describe an allelic state. Additionally, less common states such as balanced amplification (2, 2) and subclonal states representing a 50/50 mixture of subclones with and without an altered allelic state are also used.
Tumor ploidy is recalculated using the best fit parameters to transform the original relative coverage estimates into tumor copy number. Ploidy is then calculated as the average of tumor copy number across the whole genome, weighted by the normalized genomic length of each segment. This report application uses RNA-Seq data to predict the site of origin for a given tumor sample. It uses a reference cohort of RNA-Seq data from clinical and publicly available research samples and compares gene-level transcriptional profiles of this sample and the samples in the reference cohort. Predictions are made based on a subset of most varying genes.

Site of Origin Prediction
Sample similarity is computed using Spearman correlation of the samples' transcriptional profiles, which is often used in genomic studies to determine nearest neighbors for a given sample. The tumors to which this sample is most similar can help to inform clinical decisions.

Inferred Hormone Receptor Status
Version 1.0 IHC-based receptor calls and corrosponding gene expression values were obtained from TCGA breast cancer datasets [23]. Optimal thresholds in gene expression values for agreement with IHC calls were found using Youden J-index analysis. Using 10-fold cross-validation, the accuracy of this approach was determined to be 94%, 84%, and 85% for ER, PR, and Her2 in held-out TCGA breast cancer samples, respectively. In an external breast cancer cohort, the accuracies were determined to be 83%, 73%, and 86% for ER, PR, and Her2, respectively.
Breast Cancer Subtyping Version 1.0 PAM50 subtype calls and gene expression levels were obtained from the supplemental information of the TCGA landscape breast cancer paper [23]. This dataset was divided into a 70/30 split of training and testing sets, respectively. The model used here was trained on the training set and achieved >92% subtyping accuracy in the held-out samples of the testing set. The subtype assignments produced by this model are slightly more prognostic than the original 2015 PAM50 labels, and have been validated to be significantly prognostic in two independent breast cancer cohorts.
Colorectal Consensus Molecular Subtype (CMStype) Version 1.0 The CRC Consensus Molecular Subtypes (CM-Stypes) were first identified by merging the subtyping efforts upon microarray expression profiles from >4,000 colorectal samples performed by 6 independent research groups. A random forest classifier was then designed to reliably classify unseen samples into these types.
The results shown here are a novel implementation of CMStype classification designed for use with Nant RNAseq data. CMStype labels were obtained for over 1400 clinical cases and used to train a multi-class logistic regression model. This modeling strategy achieves >97% concordance with CMStypes from the original authors in unseen CRC samples.
Expression analysis Version 1.0 Any available RNA-Seq data for the patient is processed by RSEM [25] to estimate transcripts per million (TPM) and fragments per kilobase of exon per million fragments mapped (FPKM) for each isoform. Gene-level TPM and FPKM estimates are made using a weighted-average of the isoform estimates, weighted by the percentage that RSEM estimates each isoform is expressed among all isoforms in the sample.
Gene-level TPMs are used to determine if the gene is ''Over-expressed'', ''Under-expressed'', ''Not expressed'', or ''Normal'' using the lower and upper 5th percentiles of per-gene RSEM TPM values for a collection of RNA-Seq datasets from TCGA normal samples. The expression status for a gene is classified as ''Over-expressed'' if its TPM exceeds the gene's upper 5th percentile, ''Underexpressed'' if it less than the lower 5th percentile, ''Not expressed'' if its TPM value equals zero, or otherwise classified as having a ''Normal'' expression status. If a gene's upper or lower 5th percentile is unavailable, the expression status for that gene will be classified as ''N/A''.
Immunotherapy Markers Version 2.2 Microsatellite instability (MSI) status is determined using the methods described in the the Microsatellite Instability section. Total mutation counts reports the number of non-synonymous mutations present within the tumor sample. These metrics has been previously shown to be associated with response to immunotherapy [27].
A patient's RNA-Seq data is examined for expression of immune checkpoints using the methods described in the About This Test section. High expression of key immune checkpoint genes, PD-1, PD-L1, PD-L2 and CTLA-4 may indicate active suppression of immune cells by the tumor microenvironment [26]. High expression of IDO1 and TIM-3 may indicate immune tolerance mechanisms employed by the tumor [28,29]. For each gene, the median expression level within previouslyassayed clinical samples is provided.
Mutational signatures are determined by the methods described in the Mutation Signatures section. APOBEC-and POLE-related mutation signatures that contribute more than 5% of all mutations observed in the sample are reported here.
Immune Cell Status Version 1.1 A panel of 109 genes that accurately discriminate between 23 immune cell subpopulations were used as the basis of this analysis [30]. For each of these 23 immune cell signatures, the average expression level of the genes involved in each signature was calculated. Additionally, these mean expression of the 23 immune cell signatures are compared to similar samples (based on ICD10 category, when available) to infer if activation is over or under the expected range for the cancer's tissue type.
Detection of Viral Sequence Version 1.1 For samples aligned to the sequences of viruses implicated in cancer, the coverage of each viral genome is determined by the median read depth across all positions of the viral genome. Any viral genome with a median coverage greater than 3x in the primary and/or matched-normal samples will be reported, but only those with median coverage greater than 10x will be highlighted.

Complex Phenomena
Version 0.1 Somatic variants identified in the tumor genome are used to detect evidence of three complex phenomena: kataegis, extreme gains in copy number, and clustered rearrangements. Kataegis is a pattern of dense clusters of hypermutated bases, often found near somatic rearrangements. Clustered rearrangements can be evidence of a process called chromothripsis, whereby the genome is shattered into hundreds of pieces that are then randomly put back together, resulting in segments of the tumor genome that are highlyrearranged and with frequent loss of genomic material. When clustered rearrangements are found in a region that also exhibits extreme copy number, this signals the possibility that a double minute chromosome is present in the tumor genome.
Point mutations with scores of at least 10 and separated by no more than 2 Kb from its nearest neighbor are considered candidate mutations in the kataegis analysis. If a region of the genome contains a minimum of 10 candidate mutations and has a density of at least 10 candidate mutations per 100 Kb, it is classified as a potential kataegis event.
Regions exhibiting extreme gains in copy number are defined as those with relative coverage (versus matched normal) exceeding 5.0 over a span of at least 100 Kb.
Structural variants are required to have a minimum support of at least 6 reads with an average mapping quality greater than 30. Clustered rearrangements are defined as a region containing no fewer than 5 structural variants (separated by at least 10 Kb), with a breakpoint density of at least 5 breakpoints per 1 Mb.
Mutational Signatures Version 1.1 The bases directly adjacent to the mutated site are used to determine the genomic context of the site, which can help to determine if a particular mutagen (e.g. tobacco smoke, exposure to ultraviolet light) or mutational process is active in the sample.
The exposure to each of the 30 signatures identified by Sanger was calculated using nonnegative matrix factorization (NMF) on the counts of mutated triplets identified in the tumor sample [33]. '' Active'' signatures are those that contribute at least 100 mutations (representing a minimum of 2% of all mutations) or greater than than 25% percent of all mutations in the sample.
Activity of APOBEC/AID Family Version 1.0 A set of genetic factors attributed to increased APOBEC/AID activity are scanned for in the patient's sequencing data. These factors include: germline & somatic nonsense and frame-shifting variants in the related genes, associated SNPs, deletion of APOBEC3B, and any detected viruses. The related genes are APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, AICDA. The dbSNP identifiers for the associated SNPs are: rs1014971. APOBEC/AID family is predicted as active if more than 5% of somatic mutations are attributed to its mutational signature. Deletion of APOBEC3B is determined by comparing read coverage of APOBEC3B versus the average coverage of its neighboring genes APOBEC3A and APOBEC3C. APOBEC3B is predicted to be homozygously deleted if its relative coverage falls below 0.25, heterozygously deleted if its relative coverage falls below 0.75, or otherwise not deleted. The intragenic fusions ' known as EGFRvIII and MET exon 14 skipping are also identified by this method. When there is significant spanning read support, de novo assembly is performed on all sequencing data surrounding the approximate locations of the fusion in both genes to determine the precise location of the fusion within the genes' transcripts. When a precise location can be determined, the fusion transcript is gener- The raw RNA sequencing data is scanned for any support of junction sequences specific to the following 3 gene fusions and splicing variants: AR-V7, EGFRvIII, MET Exon 14 Skip.

RNA Fusion Analysis
Structural variants Version 1.2 Structural variants were identified using methods described in two publications on structural variation present in Glioblastoma multiforme tumors [31,32]. The method works in two stages. The first stage identifies clusters of discordantly mapped reads (i.e. paired reads that map in an unexpected location and/or orientation, according to the reference human genome), identifying the approximate locations in the genome where a structural variant might exist. The second stage then searches the area around both ends of the structural variant to identify ''split reads'' that span the two sides of the structural variant. If split reads are discovered, this provides orthogonal evidence that the structural variant may be real while also refining the locations of the structural variant down to base-level precision. Such variants are considered ''Precise.'' Structural variants found in regions of the human genome known to be difficult to accurately align, such as highly repetitive regions, are filtered from the analysis.
Structural variants are required to have a minimum support of at least 6 reads with an average mapping quality greater than 30. Variants that pass these criteria are then ranked according to the following heuristics (higher ranks: +2 if read support > 15, +4 if split read solution was found, +5 if it could create plausible fusion gene (i.e. correct orientation and phase), +4 for a ''Near fusion'' (i.e. correct orientation, but improper phase), +2 for deletion-type structural variants to could cause a loss of any part of a gene, +1 if variant interrupts a gene. Ranked structural variants that affect one or more cancer genes are highlighted in the report, while the full findings can be found in the supplemental files.
Provenance Version 1.3 The germline genotypes are compared between tumor and matched-normal samples to determine if the samples belong to the same person. Up to 1,000 dbSNP loci are analyzed, and the percentage of sites that share identical or compatible genotypes between samples are used to determine similarity. An incompatible genotype occurs when the tumor sample is heterozygous, while the matched-normal sample is homozygous. Normal DNA samples with less than 3% incompatibility with Tumor DNA and less than 5% incompatibility with RNA (when available) are considered to be from the same person.
Homozygosity of X Version 1.0 The germline genotypes for all single nucleotide dbSNP loci on chromosome X are used to calculate the homozygosity percentage for each sample presented in this report. For the purposes of this analysis, a locus is deemed homozygous in a sample if the maximum allele fraction (reference or alternate allele) exceeds 0.65. Only loci with read depths greater than or equal to 10 reads across all samples are considered. If more than 75% of all chromosome X loci meeting this criteria are homozygous, the sample is likely male in origin, barring any copy number alterations present on chromosome X.
Contrast Summary Version 1.0 The human genome reference build 37 (hg19) was used to align and analyze all sequencing data produced for this report. All SNV and small indel variants were annotated against common polymorphisms (found in at least 1% of the population) from dbSNP build 138.
Sequence Information Version 1.7 The human genome reference build 37 (hg19) was used to align and analyze all sequencing data produced for this report. All SNV and small indel variants were annotated against common polymorphisms (found in at least 1% of the population) from dbSNP build 142.
Throughout this report, gene names are colored according to their gene class, where oncogenes are colored RED, tumor suppressors are colored BLUE, oncogene/tumor suppressors are colored PURPLE, and other genes are colored black. Gene classes are obtained using data from COS-MIC Cancer Gene Census [1].
For each aligned sample, the numbers of total reads, mapped, and duplicates are collected. Average coverage is estimated for each exon (coding and non-coding), intron, and intergenic region between genes. Coverage of the whole genome and exome are calculated by aggregating the coverage estimates for all regions and exome regions, respectively.
When these granular coverages estimates are unavailable, average coverage is calculated by taking the total number of mapped reads multiplied by read length and divided by the total number of bases in the human genome or exome, as appropriate. This does not take into account unalignable regions of the genome, so these estimates may underestimate the true coverage of the genome.
In addition, a variety of summary metrics is computed for each position within sequencing reads and shown in the set of figures in the appendix. These metrics can help identify problems with the input sequence. The base composition of the reads gives the percentage of reads with a particular base at a given position in the reads, which can identify the presence of an adapter sequence that should be clipped from the input sequence. Base quality is split into three bins: High if q > 19, Low if q < 6, or Average. In general, base quality worsens near the end of the sequencing reads, so do not be concerned if such a pattern is observed. Finally, the per-base alignment statistics are computed for the following alignment categories: deletion (D), insertion (I), skipped (N), soft clip (S), hard clip (H), and padding (P).
Sample Contamination Version 1.0 All somatic small variants detected in the tumor sample are analyzed for how common they are in the global human population. Variants that are found in more than 5.0% of the population are considered common. The number of common sites versus the total number of somatic small variants detected is used to determine if the tumor sample is contaminated with the DNA of an unrelated individual.
In addition, somatic variants present in the matched-normal are tallied to determine if DNA from the tumor sample has potentially contaminated the matched-normal sample. High levels of such contamination can reduce sensitivity of somatic variant detection. A somatic variant is classified as present in the matched-normal sequencing data if the variant allele is found in more than 5.0% of the total reads at that site.

RNA Quality
Version 1.0 Gene-and exon-level coverages are computed by calculating the average read depth across all coding exons defined by the canonical isoform of the gene.
Expression status of genes with mutant alleles is determind using the lower and upper 5th percentiles of per-gene RSEM TPM values from a collection of RNA sequencing datasets from TCGA normal samples. The expression status for a gene is classified as ''Over-expressed'' if its TPM exceeds the gene's upper 5th percentile, ''Underexpressed'' if it is less than the lower 5th percentile, ''Not expressed'' if its TPM value equals zero, or otherwise classified as having a ''Normal'' expression status. If a gene's upper or lower 5th percentile is unavailable, the expression status for that gene is classified as ''N/A''.

About This Test
The following paragraph provides references to the published methods employed in preparation of the DNA and RNA sequencing data prior to analysis by the NantOmics Contraster analysis pipeline.

Disclaimer
This report is for informational purposes only, and is not intended to diagnose or treat any disease. The treatment of patients with any agents mentioned in this report resides solely with the discretion of the treating physician. The mere presence of genomic alterations in genes that are targeted by agents does not indicate sensitivity to that particular agent. The accuracy of this report is based solely on the data provided, which may contain errors obtained during sequencing or other downstream analysis. Any findings here should be verified with a qualified test in a CLIA laboratory setting.

Descriptions of Curated Findings biomarkers detected in sample
Biomarker Description high TYMP expression vs. low TYMP expression TYMP encodes the enzyme thymidine phosphorylase, which promotes angiogenesis and endothelial cell growth. TYMP catalyzes the removal of thymidine from thymidine nucleosides associated with pyrimidine catabolism.

high MGMT expression vs. low MGMT expression
MGMT gene encodes a DNA methyltransferase involved in DNA repair. Provides cellular defense from mutagenic DNA alkylating agents.

KRAS mutation in codon 12 or 13
Mutations at codon position 12 or 13 (within exon 2) are oncogenic activating mutations that have been observed across numerous tumor types, and are associated with resistance to some EGFR inhibitor therapies. KRAS gene encodes a small membrane-associated GTPase involved in the regulation of PI3K and RAF/MEK/ERK pathway signaling.

KRAS G12X
KRAS G12X mutations are one of the most frequently occuring missense mutations in KRAS, observed in up to 95% of all pancreatic cancers, but also identified across numerous cancer types. Substitutions at position G12 are oncogenic activating mutations and are associated with resistance to some EGFR inhibitor therapies.