TY - GEN
T1 - The Language Model, Resources, and Computational Pipelines for the Under-Resourced Iranian Azerbaijani
AU - Nouri, Marzia
AU - Amani, Mahsa
AU - Zohrabi, Reihaneh
AU - Asgari, Ehsaneddin
N1 - Publisher Copyright:
©2023 Association for Computational Linguistics.
PY - 2023/11
Y1 - 2023/11
N2 - Iranian Azerbaijani is a dialect of the Azerbaijani language spoken by more than 16% of the population in Iran (>14 million). Unfortunately, a lack of computational resources is one of the factors that puts this language and its rich culture at risk of extinction. This work aims to create fundamental natural language processing (NLP) resources and pipelines for the processing and analysis of Iranian Azerbaijani introducing standard datasets and starter models for various NLP tasks such as language modeling, text classification, part-of-speech (POS) tagging, and machine translation. The proposed resources have been curated and preprocessed to facilitate the development of NLP models for Iranian Azerbaijani and provide a strong baseline for further research and development. This study is an example of bridging the gap in NLP for low-resource languages and promoting the advancement of language technologies in underrepresented languages. To the best of our knowledge, for the first time, this paper presents major infrastructures for the processing and analysis of Iranian Azerbaijani, with the ultimate goal of improving communication and information access for millions of individuals. Furthermore, our translation model’s online demo is accessible at https://azeri.parsi.ai/.
AB - Iranian Azerbaijani is a dialect of the Azerbaijani language spoken by more than 16% of the population in Iran (>14 million). Unfortunately, a lack of computational resources is one of the factors that puts this language and its rich culture at risk of extinction. This work aims to create fundamental natural language processing (NLP) resources and pipelines for the processing and analysis of Iranian Azerbaijani introducing standard datasets and starter models for various NLP tasks such as language modeling, text classification, part-of-speech (POS) tagging, and machine translation. The proposed resources have been curated and preprocessed to facilitate the development of NLP models for Iranian Azerbaijani and provide a strong baseline for further research and development. This study is an example of bridging the gap in NLP for low-resource languages and promoting the advancement of language technologies in underrepresented languages. To the best of our knowledge, for the first time, this paper presents major infrastructures for the processing and analysis of Iranian Azerbaijani, with the ultimate goal of improving communication and information access for millions of individuals. Furthermore, our translation model’s online demo is accessible at https://azeri.parsi.ai/.
UR - https://www.scopus.com/pages/publications/105027202146
U2 - 10.18653/v1/2023.ijcnlp-short.19
DO - 10.18653/v1/2023.ijcnlp-short.19
M3 - Conference contribution
AN - SCOPUS:105027202146
T3 - Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Long Papers, IJCNLP-AACL 2023
SP - 166
EP - 174
BT - Short Papers
A2 - Park, Jong C.
A2 - Arase, Yuki
A2 - Hu, Baotian
A2 - Lu, Wei
A2 - Wijaya, Derry
A2 - Purwarianti, Ayu
A2 - Krisnadhi, Adila Alfa
PB - Association for Computational Linguistics (ACL)
T2 - 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP-AACL 2023
Y2 - 1 November 2023 through 4 November 2023
ER -