Article Text

Download PDFPDF

485 RAFT: A framework to support rapid and reproducible immuno-oncology analyses
  1. Steven Vensko,
  2. Benjamin Vincent and
  3. Dante Bortone
  1. University of North Carolina, Chapel Hill, NC, USA

Abstract

Background Analysis reproducibility and transparency are pillars of robust and trustworthy scientific results. The dependability of these results is crucial in clinical settings where they may guide high-impact decisions affecting patient health. Independent reproduction of computational results has been problematic and can be a burden on the individuals attempting to reproduce the results. Reproduction complications may arise from: 1) insufficiently described parameters, 2) vague methods, or 3) secret scripts required to generate final outputs, among others. Here we introduce RAFT (Reproducible Analyses Framework and Tools), a framework for immuno-oncology biomarker development built with Python 3 and Nextflow DSL2 which aims to enable end-to-end reproducibility of entire computational analyses in multiple contexts (e.g. local, compute cluster, or cloud) with minimal overhead through a focus on usability (figures 1 and 2).

Methods RAFT builds upon Nextflow’s DSL2 module-based approach to workflows by providing a ‘project’ context upon which users can add metadata, load references, and build up their analysis step-by-step. RAFT also has pre-built modules with workflows commonly utilized in immuno-oncology analyses (e.g. TCR/BCR repertoire reconstruction and HLA typing) and aids users through automatic module dependency resolution. Transparency is gained by having a single end-to-end script containing all steps and parameters as well as a single configuration file. Finally, RAFT allows users to create and share a package of project metadata files including the main script, all input and output checksums, all modules, and the RAFT steps required to create the analysis. This package, coupled with any required inputs files, can be used to recreate the analysis or further expand an analysis with additional datasets or alternative parameters.

Results RAFT has been used by our computational team to create an immuno-oncology meta-analysis submitted to SITC 2020. A simple, proof-of-concept analysis has been used to establish RAFT’s ability to support reproducibility by running locally on laptop computers, on multiple research compute clusters, and on the Google Cloud Platform.

Abstract 485 Figure 1

Example RAFT UsageUsers define their required inputs, build their analysis, and run their analysis using the RAFT command-line interface. The metadata from the analysis can then be shared through a RAFT package with collaborators or interested third-parties in order to reproduce or expand upon the initial results.

Abstract 485 Figure 2

End-to-end RAFTRAFT supports end-to-end analysis development through a ‘project’ structure. Users link local required files (e.g. FASTQs, references or manifests) into their appropriate/raft subdirectory. (1) Projects are initiated using the raft init-project command which creates and populates a project-specific directory. (2–3) Users then load required metadata (e.g. sample manifests or clinical data) and references (e.g. alignment references) into the project using the raft load-metadata or raft load-reference commands, respectively. (4) Modules consisting of tool-specific and topical workflows are cloned from a collection of remote repositories into the project using raft load-module. (5) Specific processes and workflows from previously loaded modules are added to the analysis (main.nf) through raft add-step. Users can then modify main.nf with their desired parameters and execute the workflow using raft run-workflow. (6) Additionally, RAFT allows an iterative approach where results from RAFT can be analyzed and modified through RStudio and re-run through Nextflow.

Conclusions The RAFT platform shows promising capabilities to support rapid and reproducible research within the field of immuno-oncology. Several features remain in development and testing, such as incorporation of additional immunogenomics feature modules such as variant/fusion detection and HLA/peptide binding affinity estimation. Other functionality in development will enable collaborators to use remote Git repository hosting (e.g. GitHub or GitLab) to jointly and iteratively modify an analysis.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.