TY - GEN
T1 - Differentially-private mining of moderately-frequent high-confidence association rules
AU - Maruseac, Mihai
AU - Ghinita, Gabriel
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/3/2
Y1 - 2015/3/2
N2 - Association rule mining allows discovering of patterns in large data repositories, and benefits diverse application domains such as healthcare, marketing, social studies, etc. However, mining datasets that contain data about individuals may cause significant privacy breaches, and disclose sensitive information about one's health status, political orientation or alternative lifestyle. Recent research addressed the privacy threats that arise when mining sensitive data, and several techniques allow data mining with differential privacy guarantees. However, existing methods only discover rules that have very large support, i.e., occur in a large fraction of the dataset transactions (typically, more than 50%). This is a serious limitation, as numerous high-quality rules do not reach such high frequencies (e.g., rules about rare diseases, or luxury merchandise). In this paper, we propose a method that focuses on mining highquality association rules with moderate and low frequencies. We employ a novel technique for rule extraction that combines the exponential mechanism of differential privacy with reservoir sampling. The proposed algorithm allows us to directly mine association rules, without the need to compute noisy supports for large numbers of itemsets. We provide a privacy analysis of the proposed method, and we perform an extensive experimental evaluation which shows that our technique is able to sample low- and moderate-support rules with high precision.
AB - Association rule mining allows discovering of patterns in large data repositories, and benefits diverse application domains such as healthcare, marketing, social studies, etc. However, mining datasets that contain data about individuals may cause significant privacy breaches, and disclose sensitive information about one's health status, political orientation or alternative lifestyle. Recent research addressed the privacy threats that arise when mining sensitive data, and several techniques allow data mining with differential privacy guarantees. However, existing methods only discover rules that have very large support, i.e., occur in a large fraction of the dataset transactions (typically, more than 50%). This is a serious limitation, as numerous high-quality rules do not reach such high frequencies (e.g., rules about rare diseases, or luxury merchandise). In this paper, we propose a method that focuses on mining highquality association rules with moderate and low frequencies. We employ a novel technique for rule extraction that combines the exponential mechanism of differential privacy with reservoir sampling. The proposed algorithm allows us to directly mine association rules, without the need to compute noisy supports for large numbers of itemsets. We provide a privacy analysis of the proposed method, and we perform an extensive experimental evaluation which shows that our technique is able to sample low- and moderate-support rules with high precision.
UR - https://www.scopus.com/pages/publications/84928153247
U2 - 10.1145/2699026.2699102
DO - 10.1145/2699026.2699102
M3 - Conference contribution
AN - SCOPUS:84928153247
T3 - CODASPY 2015 - Proceedings of the 5th ACM Conference on Data and Application Security and Privacy
SP - 13
EP - 24
BT - CODASPY 2015 - Proceedings of the 5th ACM Conference on Data and Application Security and Privacy
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Data and Application Security and Privacy, CODASPY 2015
Y2 - 2 March 2015 through 4 March 2015
ER -