Abstract
We introduce SinaTools, an open-source Python package for Arabic natural language processing and understanding. SinaTools is a unified package allowing people to integrate it into their system workflow, offering solutions for various tasks such as flat and nested Named Entity Recognition (NER), fully-flagged Word Sense Disambiguation (WSD), Semantic Relatedness, Synonymy Extractions and Evaluation, Lemmatization, Part-of-speech Tagging, Root Tagging, and additional helper utilities such as corpus processing, text stripping methods, and diacritic-aware word matching. This paper presents SinaTools and its benchmarking results, demonstrating that SinaTools outperforms all similar tools on the aforementioned tasks, such as Flat NER (87.33%), Nested NER (89.42%), WSD (82.63%), Semantic Relatedness (0.49 Spearman rank), Lemmatization (90.5%), POS tagging (93.8%), among others. SinaTools can be downloaded from (https://sina.birzeit.edu/sinatools).
| Original language | English |
|---|---|
| Pages (from-to) | 388-396 |
| Number of pages | 9 |
| Journal | Procedia Computer Science |
| Volume | 244 |
| DOIs | |
| Publication status | Published - 2024 |
| Externally published | Yes |
| Event | 6th International Conference on AI in Computational Linguistics, ACLing 2024 - Hybrid, Dubai, United Arab Emirates Duration: 21 Sept 2024 → 22 Sept 2024 |
Keywords
- Arabic
- Lemmatization
- Morphology
- Named Entity Recognition
- NLP
- NLU
- Part-of-speech Tagging
- Root tagging
- Semantic Relatedness
- Synonymy Extraction
- Toolkit
- Word Sense Disambiguation