SinaTools: Open Source Toolkit for Arabic Natural Language Processing

  • Tymaa Hammouda
  • , Mustafa Jarrar*
  • , Mohammed Khalilia
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding. SinaTools is a unified package allowing people to integrate it into their system workflow, offering solutions for various tasks such as flat and nested Named Entity Recognition (NER), fully-flagged Word Sense Disambiguation (WSD), Semantic Relatedness, Synonymy Extractions and Evaluation, Lemmatization, Part-of-speech Tagging, Root Tagging, and additional helper utilities such as corpus processing, text stripping methods, and diacritic-aware word matching. This paper presents SinaTools and its benchmarking results, demonstrating that SinaTools outperforms all similar tools on the aforementioned tasks, such as Flat NER (87.33%), Nested NER (89.42%), WSD (82.63%), Semantic Relatedness (0.49 Spearman rank), Lemmatization (90.5%), POS tagging (93.8%), among others. SinaTools can be downloaded from (https://sina.birzeit.edu/sinatools).

Original languageEnglish
Pages (from-to)388-396
Number of pages9
JournalProcedia Computer Science
Volume244
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event6th International Conference on AI in Computational Linguistics, ACLing 2024 - Hybrid, Dubai, United Arab Emirates
Duration: 21 Sept 202422 Sept 2024

Keywords

  • Arabic
  • Lemmatization
  • Morphology
  • Named Entity Recognition
  • NLP
  • NLU
  • Part-of-speech Tagging
  • Root tagging
  • Semantic Relatedness
  • Synonymy Extraction
  • Toolkit
  • Word Sense Disambiguation

Fingerprint

Dive into the research topics of 'SinaTools: Open Source Toolkit for Arabic Natural Language Processing'. Together they form a unique fingerprint.

Cite this