Navigating the Complexity of Data Imputation in Spatial Transcriptomics: Strategies, Challenges, and Future Directions

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Spatially resolved transcriptomics (SRT) has transformed our understanding of the complex molecular architectures of tissues by measuring gene expression within their spatial context, which is crucial for unraveling cell heterogeneity and intercellular communication within tissues. However, SRT datasets frequently suffer from data sparsity and dropout events that complicate the interpretation of gene expression across tissues. This review explores the current landscape of computational strategies for imputing missing data in SRT, focusing on imputation models that address the challenges inherent to data sparsity and dropout in spatially resolved contexts. We categorize these models into three main approaches: Integration into Shared Latent Space, Alignment-based Imputation, and Reference-free Spatially Informed Imputation Models. Each category utilizes distinct methodologies to infer missing gene expressions. We critically examine the underlying assumptions, advantages, and limitations of these models, assess their performance through recent benchmarking efforts, and provide recommendations for their application in biological research. This review provides an overview of strategies, a qualitative comparison between them, and highlights the need for a robust benchmarking study. Thereby offering a comprehensive outline of the field and providing direction for future efforts, to derive more accurate and biologically meaningful insights from incomplete but spatially contextualized datasets.

Keywords

Spatially resolved transcriptomics, data imputation

Citation