TY - GEN
T1 - Nabra
T2 - 1st Arabic Natural Language Processing Conference, ArabicNLP 2023
AU - Nayouf, Amal
AU - Hammouda, Tymaa Hasanain
AU - Jarrar, Mustafa
AU - Zaraket, Fadi A.
AU - Kurdy, Mohamad Bassam
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - This paper presents Nâbr̄a (), a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies and series, lyrics of songs and local proverbs to build Nâbr̄a. Nâbr̄a covers several local Syrian dialects including those of Aleppo, Damascus, Deir-ezzur, Hama, Homs, Huran, Latakia, Mardin, Raqqah, and Suwayda. A team of nine annotators annotated the 60K tokens with full morphological annotations across sentence contexts. We trained the annotators to follow methodological annotation guidelines to ensure unique morpheme annotations, and normalized the annotations. F1 and κ agreement scores ranged between 74% and 98% across features, showing the excellent quality of Nâbr̄a annotations. Our corpora are open-source and publicly available as part of the Currasat portal https://sina.birzeit.edu/currasat.
AB - This paper presents Nâbr̄a (), a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies and series, lyrics of songs and local proverbs to build Nâbr̄a. Nâbr̄a covers several local Syrian dialects including those of Aleppo, Damascus, Deir-ezzur, Hama, Homs, Huran, Latakia, Mardin, Raqqah, and Suwayda. A team of nine annotators annotated the 60K tokens with full morphological annotations across sentence contexts. We trained the annotators to follow methodological annotation guidelines to ensure unique morpheme annotations, and normalized the annotations. F1 and κ agreement scores ranged between 74% and 98% across features, showing the excellent quality of Nâbr̄a annotations. Our corpora are open-source and publicly available as part of the Currasat portal https://sina.birzeit.edu/currasat.
UR - https://www.scopus.com/pages/publications/85176369622
U2 - 10.18653/v1/2023.arabicnlp-1.2
DO - 10.18653/v1/2023.arabicnlp-1.2
M3 - Conference contribution
AN - SCOPUS:85176369622
T3 - ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings
SP - 12
EP - 23
BT - ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Porceedings
A2 - Sawaf, Hassan
A2 - El-Beltagy, Samhaa
A2 - Zaghouani, Wajdi
A2 - Magdy, Walid
A2 - Tomeh, Nadi
A2 - Abu Farha, Ibrahim
A2 - Habash, Nizar
A2 - Khalifa, Salam
A2 - Keleg, Amr
A2 - Haddad, Hatem
A2 - Zitouni, Imed
A2 - Abdelali, Ahmed
A2 - Mrini, Khalil
A2 - Almatham, Rawan
PB - Association for Computational Linguistics (ACL)
Y2 - 7 December 2023
ER -