Background Recent advances in high-parameter spatial biology have yielded a rapidly growing new class of biological data, allowing researchers to more comprehensively characterize cellular state and morphology in native tissue context. However, spatial biology lacks a cohesive data abstraction on which to build novel computational tools and algorithms, making it difficult to fully leverage these emergent data. Here, we present emObject, a domain-specific data abstraction for spatial biology data and experiments. We demonstrate the simplicity, flexibility, and extensibility of emObject for a range of spatial omics data types, including the analysis of Visium, MIBI, and CODEX data, as well as for integrated spatial multiomic experiments.
Methods emObject is designed to leverage the rich Python data science ecosystem, providing a familiar API to users. As many spatial omics assays generate imaging data, either as the primary data (e.g. multiplexed proteomic or transcriptomic imaging) or as accessory data (e.g. 10X Visium), we implemented an efficient image representation using Zarr, an open-source standard for high-dimensional array data.1 Our image container allows large image data to reside on-disk, rather than in-memory, and for only relevant chunks of large images to be loaded into memory when needed. Additionally, because Zarr implements efficient compression, our container reduces disk requirements for imaging data. In certain assays where image channels correspond with biomarkers or genes, images can also be aligned along a shared variable axis. Other emObject attributes are built using NumPy and Pandas, giving emObject attributes a familiar API.2–4
Results To highlight the utility of emObject, we perform a few exemplar analyses on various spatial datasets including 10X Visium, H&E, NanoString GeoMX, CODEX, and MIBI. We use these examples to demonstrate two integrated analyses - one between H&E and spatial transcriptomics and a second between spatial transcriptomics and proteomics.
Conclusions As spatial data modalities expand and become more commonplace in the omics data stack, integrative multiomic analysis of these datasets provides a unique opportunity to capture a more complete view of how the transcriptome maps to proteomic or histological observables and vice versa. The development of emObject is a step towards building a unified data science ecosystem for spatial biology and accelerating the pace of scientific discovery.
Miles A, et al. zarr-developers/zarr-python: v2.4.0. Zenodo (2020) doi:10.5281/zenodo.3773450.
Team TPD. pandas-dev/pandas: Pandas. Zenodo (2023) doi:10.5281/zenodo.7979740.
Harris CR, et al. Array programming with NumPy. Nature 2020;585:357–362.
McKinney W. Data structures for statistical computing in python. in Proceedings of the 9th Python in Science Conference 56–61 (SciPy, 2010). doi:10.25080/Majora-92bf1922-00a.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.