PrivPetal: Relational Data Synthesis via Permutation Relations

Research output: Contribution to journalArticlepeer-review

Abstract

Releasing relational databases while preserving privacy is an important research problem with numerous applications. A canonical approach is to generate synthetic data under differential privacy (DP), which provides a strong, rigorous privacy guarantee. The problem is particularly challenging when the data involve not only entities (e.g., represented by records in tables) but also relationships (represented by foreign-key references), since if we generate random records for each entity independently, the resulting synthetic data usually fail to exhibit realistic relationships. The current state of the art, PrivLava, addresses this issue by generating random join key attributes through a sophisticated expectation-maximization (EM) algorithm. This method, however, is rather costly in terms of privacy budget consumption, due to the numerous EM iterations needed to retain high data utility. Consequently, the privacy cost of PrivLava can be prohibitive for some real-world scenarios.

We observe that the utility of the synthetic data is inherently sensitive to the join keys: changing the primary key of a record t, for example, causes t to join with a completely different set of partner records, which may lead to a significant distribution shift of the join result. Consequently, join keys need to be kept highly accurate, meaning that enforcing DP on them inevitably incurs a high privacy cost. In this paper, we explore a different direction: synthesizing a flattened relation and subsequently decomposing it down to base relations, which eliminates the need to generate join keys. Realizing this idea is challenging, since naively flattening a relational schema leads to a rather high-dimensional table, which is hard to synthesize accurately with differential privacy.

We present a sophisticated PrivPetal approach that addresses the above issues via a novel concept: permutation relation, which is constructed as a surrogate to synthesize the flattened relation, avoiding the generation of a high-dimensional relation directly. The synthesis is done using a refined Markov random field mechanism, backed by fine-grained privacy analysis. Extensive experiments using multiple real datasets and the TPC-H benchmark demonstrate that PrivPetal significantly outperforms existing methods in terms of aggregate query accuracy on the synthetic data.
Original languageEnglish
JournalCoRR
Volumeabs/2503.22970
DOIs
Publication statusPublished - 18 Jun 2025

Fingerprint

Dive into the research topics of 'PrivPetal: Relational Data Synthesis via Permutation Relations'. Together they form a unique fingerprint.

Cite this