TY - GEN
T1 - Train Without Strain
T2 - 2025 IEEE International Conference on Communications, ICC 2025
AU - Hamood, Moqbel
AU - Albaseer, Abdullatif
AU - Abdallah, Mohamed
AU - Al-Fuqaha, Ala
AU - Hamdaoui, Bechir
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Deploying transformer models in Personalized Federated Learning (PFL) over wireless networks is challenging due to their large size, which leads to high communication overhead, increased latency, and excessive energy consumption. Traditional pruning and sparsification methods, designed mainly for conventional deep learning architectures, are ineffective for transformers and can cause divergence or degrade performance - especially when applied to self-attention layers or through direct federated averaging. To address these challenges, we propose a novel dual approach called PFL-TPS (PFL with Transformer Pruning and Sparsification). Our approach efficiently reduces communication and computation costs while maintaining model performance, making it suitable for resource-constrained wireless networks. Specifically, we apply adaptive pruning with trainable thresholds to the transformer's Feed-Forward Layers (FFLs), and only these trainable thresholds are shared with the server, resulting in minimal uploaded data. For the Self-Attention Layers (SALs), instead of transmitting bandwidth-intensive model parameters, we employ a server-side hypernetwork that generates personalized parameters based on device-specific embedding vectors sent by the devices, significantly reducing communication overhead and maintaining personalization. Extensive experiments show that PFL-TPS reduces energy consumption by up to 50%, decreases training time by 60.44%, and improves model accuracy by 49.87% compared to baselines in wireless networks.
AB - Deploying transformer models in Personalized Federated Learning (PFL) over wireless networks is challenging due to their large size, which leads to high communication overhead, increased latency, and excessive energy consumption. Traditional pruning and sparsification methods, designed mainly for conventional deep learning architectures, are ineffective for transformers and can cause divergence or degrade performance - especially when applied to self-attention layers or through direct federated averaging. To address these challenges, we propose a novel dual approach called PFL-TPS (PFL with Transformer Pruning and Sparsification). Our approach efficiently reduces communication and computation costs while maintaining model performance, making it suitable for resource-constrained wireless networks. Specifically, we apply adaptive pruning with trainable thresholds to the transformer's Feed-Forward Layers (FFLs), and only these trainable thresholds are shared with the server, resulting in minimal uploaded data. For the Self-Attention Layers (SALs), instead of transmitting bandwidth-intensive model parameters, we employ a server-side hypernetwork that generates personalized parameters based on device-specific embedding vectors sent by the devices, significantly reducing communication overhead and maintaining personalization. Extensive experiments show that PFL-TPS reduces energy consumption by up to 50%, decreases training time by 60.44%, and improves model accuracy by 49.87% compared to baselines in wireless networks.
KW - learnable Thresholds
KW - Personalized Federated Learning (PFL)
KW - Pruning
KW - Resource Optimization
KW - Sparse Models
KW - Transformers
UR - https://www.scopus.com/pages/publications/105018472874
U2 - 10.1109/ICC52391.2025.11161879
DO - 10.1109/ICC52391.2025.11161879
M3 - Conference contribution
AN - SCOPUS:105018472874
T3 - IEEE International Conference on Communications
SP - 3575
EP - 3580
BT - ICC 2025 - IEEE International Conference on Communications
A2 - Valenti, Matthew
A2 - Reed, David
A2 - Torres, Melissa
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 June 2025 through 12 June 2025
ER -