TY - CHAP
T1 - ALP
T2 - An Arabic Linguistic Pipeline
AU - Freihat, Abed Alhakim
AU - Bella, Gábor
AU - Abbas, Mourad
AU - Mubarak, Hamdy
AU - Giunchiglia, Fausto
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - This paper presents ALP, an entirely new linguistic pipeline for natural language processing of text in Modern Standard Arabic. In contrary to the conventional pipeline architecture, we solve common NLP operations of word segmentation, POS tagging, and named entity recognition as a single sequence labeling task. Based on this single component, we also introduce a new lemmatizer tool that combines machine-learning-based and dictionary-based approaches, the latter providing increased accuracy, robustness, and flexibility to the former. In addition, we present a base phrase chunking tool which is an essential tool in many NLP operations. The presented pipeline configuration results in a faster operation and is able to provide a solution to the challenges of processing Modern Standard Arabic, such as the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels.
AB - This paper presents ALP, an entirely new linguistic pipeline for natural language processing of text in Modern Standard Arabic. In contrary to the conventional pipeline architecture, we solve common NLP operations of word segmentation, POS tagging, and named entity recognition as a single sequence labeling task. Based on this single component, we also introduce a new lemmatizer tool that combines machine-learning-based and dictionary-based approaches, the latter providing increased accuracy, robustness, and flexibility to the former. In addition, we present a base phrase chunking tool which is an essential tool in many NLP operations. The presented pipeline configuration results in a faster operation and is able to provide a solution to the challenges of processing Modern Standard Arabic, such as the rich morphology, agglutinative aspects, and lexical ambiguity due to the absence of short vowels.
UR - https://www.scopus.com/pages/publications/85149434459
U2 - 10.1007/978-3-031-11035-1_4
DO - 10.1007/978-3-031-11035-1_4
M3 - Chapter
AN - SCOPUS:85149434459
T3 - Signals and Communication Technology
SP - 67
EP - 99
BT - Signals and Communication Technology
PB - Springer Science and Business Media Deutschland GmbH
ER -