Article Text
Abstract
Background As the problem space for spatial biology becomes increasingly complex, so does tooling for analyzing multi-omic workflows. Assay providers focus heavily on extracting as much data as possible from samples with little consideration for researchers needing more time and computational resources to extract findings. To alleviate this pain, we have developed a novel experience for rapid end-to-end processing of multiple Visium Flowcell readouts, from raw sequencing to automated study-wide spatial analysis (SA) insights.
Methods For preprocessing, we package 10X Genomics’ Spaceranger Spatial Gene Expression toolkit in a Docker image. The resulting container ran as a parallelized AWS Batch job, orchestrated by an AWS Step Function, and deployed programmatically with AWS Cloudformation templates to simplify the conversion and deconvolution of Visium data into a streamlined 1-click job format. The resulting data is transferred via AWS Simple Storage Service (S3) into the Posit Workbench environment for preliminary SA. Workbench relies on a managed EKS Kubernetes cluster hosting client sessions and a central web server running on EC2 for routing and orchestration. Files are persisted to an EC2-backed Network File Server (NFS), allowing concurrent access and file sharing across users and sessions.
Results The SA platform consists of three parts that automate SA insights and provide low-code workspaces for further exploration. First, the automated spatial analysis workflow generated HTML insights reports with figures. This workflow consisted of preliminary data quality control, batch-effects correction, clustering, differential expression, spatial neighborhood detection, gene module analysis, and gene ontology enrichment analyses. Second, low-code RMarkdown and IPython notebooks were built to simplify exploring additional analyses in a tailored docker environment. Lastly, results are integrated into the Enable Medicine Portal for spatial visualization of results overlaid on image data.
Conclusions We showed the effectiveness of this system by analyzing a cohort of three patients with adenocarcinoma and one patient with signet ring cell carcinoma (SRCC), each with one primary tumor sample and one matched adjacent normal tissue. The result was a spatially-resolved analysis of the tumor microenvironment across clusters, diseases, and tissue types. This pipeline identified differences in the spatial arrangement between tumor and normal tissue and demonstrated the spatial expression patterns that differentiate adenocarcinoma and SRCC.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.