This thesis aims to reliably identify the latent topics that concern smart city residents and protect
their privacy when they use Large Language Models (LLMs). In particular, it seeks to iden-
tify residents’ key perceptions and reactions towards global issues, including climate change.
For example, it explores the dominant news discussed and shared by smart city residents. For
that, we propose a topic modeling approach that leverages the superiority of model embed-
dings for semantic search and the reliability of reverse dictionaries to identify relevant topics.
This novel topic modeling approach competes with state-of-the-art approaches, including La-
tent Dirichlet Allocation (LDA) and BERTopic. With accurate topic identification, smart city
decision-makers can prioritize and tailor relevant services to enhance residents’ satisfaction and
quality of life.
Moreover, this thesis aims to protect the privacy of smart city residents. In the era of Large
Language Models (LLM), smart city residents are expected to frequently interact and utilize
this technology,i.e., by asking diverse questions. However, these questions are not always gen-
eral; some are privacy-sensitive, making it critical to protect the prompting privacy of LLM end
users. This need becomes even more pressing for scenarios where smart city residents do not
fully trust the LLM’s service provider, which is a reasonable concern for privacy-conscious resi-
dents. Typical privacy protection techniques, including encryption, have limitations on prompt-
ing privacy because typical LLM frameworks require decoding prompts to plain text to be able
to process them and generate prompt-related responses. That means that providers would know
the exact prompts, exposing end-users to privacy risks. To address this privacy concern, we
propose submitting the associated embeddings of the prompts instead of the plain prompts.
Since embeddings are irreversible, LLM providers would reconstruct and utilize the approxi-
mate (not exact) prompts, mitigating the risk of confidently identifying the exact prompts, and
preventing potential serious privacy consequences. For the prompt reconstruction process to
strike a tradeoff between the prompting privacy and utility (maintaining the semantic meaning
of the prompt), we propose utilizing the embeddings of reverse dictionaries, which can reliably
project the embeddings of user prompts to a semantically relevant item within the reverse dic-
tionary. By reliable topic modeling and mitigation of privacy concerns relevant to LLM usage,
this thesis ensures actionable insights and safeguards resident privacy in smart city applications.
| Date of Award | 2025 |
|---|
| Original language | American English |
|---|
| Awarding Institution | - HBKU College of Science and Engineering
|
|---|
Leveraging LLM Embeddings and Reverse Dictionaries for Reliable Topic Modeling and Privacy-Sensitive Smart City Applications: Toward Residents’ Satisfaction and Safety
Mohammed, E. (Author). 2025
Student thesis: Master's Dissertation