TY - GEN
T1 - Emo3D
T2 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
AU - Dehghani, Mahshid
AU - Shafiee, Amirahmad
AU - Shafiei, Ali
AU - Fallah, Neda
AU - Alizadeh, Farahmand
AU - Gholinejad, Mohammad Mehdi
AU - Behroozi, Hamid
AU - Habibi, Jafar
AU - Asgari, Ehsaneddin
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025/4
Y1 - 2025/4
N2 - 3D facial emotion modeling has important applications in areas such as animation design, virtual reality, and emotional human-computer interaction (HCI). However, existing models are constrained by limited emotion classes and insufficient datasets. To address this, we introduce Emo3D, an extensive "Text-Image-Expression dataset" that spans a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, enabling the capture of a broad range of emotional expressions. Using this unique dataset, we perform a comprehensive evaluation of fine-tuned language-based models and vision-language models, such as Contrastive Language-Image Pretraining (CLIP), for 3D facial expression synthesis. To better assess conveyed emotions, we introduce Emo3D metric, a new evaluation metric that aligns more closely with human perception than traditional Mean Squared Error (MSE). Unlike MSE, which focuses on numerical differences, Emo3D captures emotional nuances in visual-text alignment and semantic richness. Emo3D dataset and metric hold great potential for advancing applications in animation and virtual reality.
AB - 3D facial emotion modeling has important applications in areas such as animation design, virtual reality, and emotional human-computer interaction (HCI). However, existing models are constrained by limited emotion classes and insufficient datasets. To address this, we introduce Emo3D, an extensive "Text-Image-Expression dataset" that spans a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, enabling the capture of a broad range of emotional expressions. Using this unique dataset, we perform a comprehensive evaluation of fine-tuned language-based models and vision-language models, such as Contrastive Language-Image Pretraining (CLIP), for 3D facial expression synthesis. To better assess conveyed emotions, we introduce Emo3D metric, a new evaluation metric that aligns more closely with human perception than traditional Mean Squared Error (MSE). Unlike MSE, which focuses on numerical differences, Emo3D captures emotional nuances in visual-text alignment and semantic richness. Emo3D dataset and metric hold great potential for advancing applications in animation and virtual reality.
UR - https://www.scopus.com/pages/publications/105028686627
U2 - 10.18653/v1/2025.findings-naacl.173
DO - 10.18653/v1/2025.findings-naacl.173
M3 - Conference contribution
AN - SCOPUS:105028686627
T3 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025
SP - 3158
EP - 3172
BT - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
A2 - Chiruzzo, Luis
A2 - Ritter, Alan
A2 - Wang, Lu
PB - Association for Computational Linguistics (ACL)
Y2 - 29 April 2025 through 4 May 2025
ER -