ALP: An Arabic Linguistic Pipeline

  • Abed Alhakim Freihat*
  • , Gábor Bella
  • , Mourad Abbas
  • , Hamdy Mubarak
  • , Fausto Giunchiglia
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

4 Citations (Scopus)

Abstract

This paper presents ALP, an entirely new linguistic pipeline for natural language processing of text in Modern Standard Arabic. In contrary to the conventional pipeline architecture, we solve common NLP operations of word segmentation, POS tagging, and named entity recognition as a single sequence labeling task. Based on this single component, we also introduce a new lemmatizer tool that combines machine-learning-based and dictionary-based approaches, the latter providing increased accuracy, robustness, and flexibility to the former. In addition, we present a base phrase chunking tool which is an essential tool in many NLP operations. The presented pipeline configuration results in a faster operation and is able to provide a solution to the challenges of processing Modern Standard Arabic, such as the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels.

Original languageEnglish
Title of host publicationSignals and Communication Technology
PublisherSpringer Science and Business Media Deutschland GmbH
Pages67-99
Number of pages33
DOIs
Publication statusPublished - 2023

Publication series

NameSignals and Communication Technology
ISSN (Print)1860-4862
ISSN (Electronic)1860-4870

Fingerprint

Dive into the research topics of 'ALP: An Arabic Linguistic Pipeline'. Together they form a unique fingerprint.

Cite this