POS tagging for improving code-switching identification in arabic

  • Mohammed Attia
  • , Younes Samih
  • , Ali Elkahky
  • , Hamdy Mubarak
  • , Ahmed Abdelali
  • , Kareem Darwish

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9% on the development set and 1.0% on the test set. We also show that in intrasentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.

Original languageEnglish
Title of host publicationACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages18-29
Number of pages12
ISBN (Electronic)9781950737321
Publication statusPublished - 2019
Event4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019 - Florence, Italy
Duration: 1 Aug 2019 → …

Publication series

NameACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop

Conference

Conference4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019
Country/TerritoryItaly
CityFlorence
Period1/08/19 → …

Fingerprint

Dive into the research topics of 'POS tagging for improving code-switching identification in arabic'. Together they form a unique fingerprint.

Cite this