Provable Privacy for Database Generation: an Information Theoretic Approach

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Many methods exist to avoid disclosing sensitive information when releasing a database. However these methods either cannot guarantee that the information of individuals is secure or are aimed at specific use cases. In this paper we develop a method which is both provably private and retains the overall form of the original database. To achieve this we derive a privacy measure, epsilon-dependence. Intuitively, epsilon-dependence requires that the input and output databases are nearly independent. We show that epsilon-dependence can be seen as an information theoretic refinement of differential privacy. We then adapt the KRIMP algorithm to generate databases while satisfying epsilon-dependence. We show through experiments that the generated databases are comparable to the original databases when performing machine learning or itemset mining tasks. The results are especially good on larger databases.

Keywords

Citation