Deep Learning of large-scale genomic context for mutational signature assignment
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
Mutational processes, that is, patterns of somatic base changes caused by endogenous enzymes, environmental mutagens, and DNA-repair failures, are a powerful lens for understanding cancer etiology and clinical phenotype. Classical decomposition methods (e.g., NMF-based extraction and refitting) provide interpretable, sample-level mixtures of signatures but suffer from non-uniqueness, poor performance on low-mutation samples, and a limited capacity to model interactions with genomic context. This literature review evaluates whether deep learning (DL) approaches can improve detection of mutational processes and, critically, whether they can help assign individual mutations to their causal processes. We combine and summarize findings from different studies on autoencoders (denoising/sparse AEs, VAEs and explainable variants), convolutional and attention-based classifiers, and hybrid multimodal frameworks. DL models show clear advantages: nonlinear encodings and denoising improve sensitivity to weak or overlapping signatures, attention and embedding methods identify mutation subsets that align with known etiologies, and multimodal VAEs disambiguate processes by integrating orthogonal data (indels, CNAs, expression, or chromatin state). Practical gains include enhanced de novo signature discovery, improved performance on low-count panels, and interpretable per-sample factors that correlate with COSMIC signatures. Nevertheless, DL introduces challenges, such as artifact learning, latent-space interpretability, and dependence on cohort composition, and therefore rigorous calibration, artifact control and multimodal inputs remain essential. We conclude that DL represents a notable advance for mutational-process detection and moves the field closer to reliable per-mutation attribution.
Keywords
mutational signatures; deep learning; autoencoders; attention models; mutational process attribution; multimodal integration.