Background Analysis reproducibility and transparency are pillars of robust and trustworthy scientific results. The dependability of these results is crucial in clinical settings where they may guide high-impact decisions affecting patient health. Independent reproduction of computational results has been problematic and can be a burden on the individuals attempting to reproduce the results. Reproduction complications may arise from: 1) insufficiently described parameters, 2) vague methods, or 3) secret scripts required to generate final outputs, among others. Here we introduce RAFT (Reproducible Analyses Framework and Tools), a framework for immuno-oncology biomarker development built with Python 3 and Nextflow DSL2 which aims to enable end-to-end reproducibility of entire computational analyses in multiple contexts (e.g. local, compute cluster, or cloud) with minimal overhead through a focus on usability (figures 1 and 2).
Methods RAFT builds upon Nextflow’s DSL2 module-based approach to workflows by providing a ‘project’ context upon which users can add metadata, load references, and build up their analysis step-by-step. RAFT also has pre-built modules with workflows commonly utilized in immuno-oncology analyses (e.g. TCR/BCR repertoire reconstruction and HLA typing) and aids users through automatic module dependency resolution. Transparency is gained by having a single end-to-end script containing all steps and parameters as well as a single configuration file. Finally, RAFT allows users to create and share a package of project metadata files including the main script, all input and output checksums, all modules, and the RAFT steps required to create the analysis. This package, coupled with any required inputs files, can be used to recreate the analysis or further expand an analysis with additional datasets or alternative parameters.
Results RAFT has been used by our computational team to create an immuno-oncology meta-analysis submitted to SITC 2020. A simple, proof-of-concept analysis has been used to establish RAFT’s ability to support reproducibility by running locally on laptop computers, on multiple research compute clusters, and on the Google Cloud Platform.
Conclusions The RAFT platform shows promising capabilities to support rapid and reproducible research within the field of immuno-oncology. Several features remain in development and testing, such as incorporation of additional immunogenomics feature modules such as variant/fusion detection and HLA/peptide binding affinity estimation. Other functionality in development will enable collaborators to use remote Git repository hosting (e.g. GitHub or GitLab) to jointly and iteratively modify an analysis.
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.