TY - GEN
T1 - Privtrie
T2 - 34th IEEE International Conference on Data Engineering, ICDE 2018
AU - Wang, Ning
AU - Xiao, Xiaokui
AU - Yang, Yin
AU - Hoang, Ta Duy
AU - Shin, Hyejin
AU - Shin, Junbum
AU - Yu, Ge
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/24
Y1 - 2018/10/24
N2 - A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.
AB - A mobile operating system often needs to collect frequent new terms from users in order to build and maintain a comprehensive dictionary. Collecting keyboard usage data, however, raises privacy concerns. Local differential privacy (LDP) has been established as a strong privacy standard for collecting sensitive information from users. Currently, the best known solution for LDP-compliant frequent term discovery transforms the problem into collecting n-grams under LDP, and subsequently reconstructs terms from the collected n-grams by modelling the latter into a graph, and identifying cliques on this graph. Because the transformed problem (i.e., collecting n-grams) is very different from the original one (discovering frequent terms), the end result has poor utility. Further, this method is also rather expensive due to clique computation on a large graph. In this paper we tackle the problem head on: our proposal, PrivTrie, directly collects frequent terms from users by iteratively constructing a trie under LDP. While the methodology of building a trie is an obvious choice, obtaining an accurate trie under LDP is highly challenging. PrivTrie achieves this with a novel adaptive approach that conserves privacy budget by building internal nodes of the trie with the lowest level of accuracy necessary. Experiments using real datasets confirm that PrivTrie achieves high accuracy on common privacy levels, and consistently outperforms all previous methods.
KW - Frequent term
KW - Trie
KW - local differential privacy
UR - https://www.scopus.com/pages/publications/85057076327
U2 - 10.1109/ICDE.2018.00079
DO - 10.1109/ICDE.2018.00079
M3 - Conference contribution
AN - SCOPUS:85057076327
T3 - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
SP - 821
EP - 832
BT - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 April 2018 through 19 April 2018
ER -