Skip to main navigation Skip to search Skip to main content

Borderless Azerbaijani Processing: Linguistic Resources and a Transformer-based Approach for Azerbaijani Transliteration

  • Reihaneh Zohrabi
  • , Mostafa Masumi
  • , Omid Ghahroodi
  • , Parham AbedAzad
  • , Hamid Beigy
  • , Mohammad H. Rohban
  • , Ehsaneddin Asgari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advancements in neural language models have revolutionized natural language understanding. However, many languages still face the risk of being left behind without the benefits of such advancements, potentially leading to their extinction. One such language is Azerbaijani in Iran, which suffers from limited digital resources and a lack of alignment between spoken and written forms. In contrast, Azerbaijani in the Republic of Azerbaijan has seen more resources and is not considered as low-resource as its Iranian counterpart. In this context, our research focuses on the computational progress made in Iranian Azerbaijani language. We propose a transliteration model that leverages an Azerbaijani parallel dataset, effectively bridging the gap between the Latin and Persian scripts. By enabling seamless communication between these two scripts, our model facilitates cultural exchange and serves as a valuable tool for transfer learning. The effectiveness of our approach surpasses traditional rule-based methods, as evidenced by the significant improvements in performance metrics. We observe a minimum 15% increase in BLEU scores and a reduction of at least 1/3 in edit distance. Furthermore, our model’s online demo is accessible at https://azeri.parsi.ai/.

Original languageEnglish
Title of host publicationShort Papers
EditorsJong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, Adila Alfa Krisnadhi
PublisherAssociation for Computational Linguistics (ACL)
Pages175-183
Number of pages9
ISBN (Electronic)9798891760141
DOIs
Publication statusPublished - Nov 2023
Externally publishedYes
Event13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP-AACL 2023 - Bali, Indonesia
Duration: 1 Nov 20234 Nov 2023

Publication series

NameProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Long Papers, IJCNLP-AACL 2023
Volume2

Conference

Conference13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP-AACL 2023
Country/TerritoryIndonesia
CityBali
Period1/11/234/11/23

Fingerprint

Dive into the research topics of 'Borderless Azerbaijani Processing: Linguistic Resources and a Transformer-based Approach for Azerbaijani Transliteration'. Together they form a unique fingerprint.

Cite this