Exploring Alignment in Shared Cross-lingual Spaces

Basel Mousi, Nadir Durrani, Fahim Dalvi, Majd Hawasly, Ahmed Abdelali

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite their remarkable ability to capture linguistic nuances across diverse languages, questions persist regarding the degree of alignment between languages in multilingual embeddings. Drawing inspiration from research on high-dimensional representations in neural language models, we employ clustering to uncover latent concepts within multilingual models. Our analysis focuses on quantifying the alignment and overlap of these concepts across various languages within the latent space. To this end, we introduce two metrics CALIGN and COLAP aimed at quantifying these aspects, enabling a deeper exploration of multilingual embeddings. Our study encompasses three multilingual models (mT5, mBERT, and XLM-R) and three downstream tasks (Machine Translation, Named Entity Recognition, and Sentiment Analysis). Key findings from our analysis include: i) deeper layers in the network demonstrate increased cross-lingual alignment due to the presence of language-agnostic concepts, ii) fine-tuning of the models enhances alignment within the latent space, and iii) such task-specific calibration helps in explaining the emergence of zero-shot capabilities in the models.

Original languageEnglish
Title of host publicationLong Papers
EditorsLun-Wei Ku, Andre F. T. Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages6326-6348
Number of pages23
ISBN (Electronic)9798891760943
DOIs
Publication statusPublished - 24 Mar 2024
Event62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityBangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'Exploring Alignment in Shared Cross-lingual Spaces'. Together they form a unique fingerprint.

Cite this