The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Iranian Azerbaijani is a dialect of the Azerbaijani language spoken by more than 16% of the population in Iran (>14 million). Unfortunately, a lack of computational resources is one of the factors that puts this language and its rich culture at risk of extinction. This work aims to create fundamental natural language processing (NLP) resources and pipelines for the processing and analysis of Iranian Azerbaijani introducing standard datasets and starter models for various NLP tasks such as language modeling, text classification, part-of-speech (POS) tagging, and machine translation. The proposed resources have been curated and preprocessed to facilitate the development of NLP models for Iranian Azerbaijani and provide a strong baseline for further research and development. This study is an example of bridging the gap in NLP for low-resource languages and promoting the advancement of language technologies in underrepresented languages. To the best of our knowledge, for the first time, this paper presents major infrastructures for the processing and analysis of Iranian Azerbaijani, with the ultimate goal of improving communication and information access for millions of individuals. Furthermore, our translation model’s online demo is accessible at https://azeri.parsi.ai/.

Original languageEnglish
Title of host publicationShort Papers
EditorsJong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, Adila Alfa Krisnadhi
PublisherAssociation for Computational Linguistics (ACL)
Pages166-174
Number of pages9
ISBN (Electronic)9798891760141
DOIs
Publication statusPublished - Nov 2023
Externally publishedYes
Event13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP-AACL 2023 - Bali, Indonesia
Duration: 1 Nov 20234 Nov 2023

Publication series

NameProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Long Papers, IJCNLP-AACL 2023
Volume2

Conference

Conference13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP-AACL 2023
Country/TerritoryIndonesia
CityBali
Period1/11/234/11/23

Fingerprint

Dive into the research topics of 'The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani'. Together they form a unique fingerprint.

Cite this