How realistic is my synthetic data? A qualitative approach.

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Missing values represent one of the most common challenges for data analytics tasks. For that reason, a lot of techniques have been proposed to fill the missing values through what is called ”Data Imputation”. Recent studies on generating synthetic data demonstrate that Generative Adversarial Networks (GANs) can be used to effectively solve this problem as follows: for each example in the original data generate a synthetic example that keeps the existing values. The generated example should contain values for the features with missing values. However, to confirm if GANs can provide significant improvements over traditional data imputation techniques, we need a technique to measure the quality of the generated examples. The quality of the generated example can be measured by determining how realistic the synthetic data is compared to the original examples. In this project, we develop a tool for successfully measuring the quality of the synthetic data. We compare the quality of the generated data using GANs to other synthetic data generation techniques.

Keywords

Citation