A SURVEY OF DISCRETE DIFFUSION METHODS FOR NATURAL LANGUAGE AND DNA SEQUENCE GENERATION
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
While diffusion models have achieved state-of-the-art results in continuous domains like image generation, their application to inherently discrete data such as natural language and DNA presents unique challenges. Continuous-space adaptations often introduce artifacts and complexities, motivating a focused investigation into models that operate directly on discrete data. This survey provides a comprehensive overview of the methods and advancements in the field of discrete diffusion models. We review the foundational formulations, including Denoising Diffusion Probabilistic Models (DDPMs) and Score-Based Generative Models (SGMs), and their theoretical adaptations to discrete state spaces. We then chronologically survey advancements across key modalities—Natural Language Processing and DNA sequences—examining critical research topics such as novel forward processes and the
adaptation of pre-trained language models. By synthesizing these developments and outlining future research directions, this paper offers a structured overview to this rapidly evolving field.
Keywords
Diffusion, Discrete Diffusion, Diffusion language model, Natural language Processing, Generative Modeling, Denoising Diffusion Probabilistic Models, Score-Based Generative Models, DNA generation,