LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content

  • Mohamed Bayan Kmainasi
  • , Ali Ezzat Shahroor
  • , Maram Hasanain
  • , Sahinur Rahman Laskar
  • , Naeemul Hassan
  • , Firoj Alam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) have demonstrated remarkable success as general-purpose task solvers across various fields. However, their capabilities remain limited when addressing domain-specific problems, particularly in downstream NLP tasks. Research has shown that models fine-tuned on instruction-based downstream NLP datasets outperform those that are not fine-tuned. While most efforts in this area have primarily focused on resource-rich languages like English and broad domains, little attention has been given to multilingual settings and specific domains. To address this gap, this study focuses on developing a specialized LLM, LlamaLens, for analyzing news and social media content in a multilingual context. To the best of our knowledge, this is the first attempt to tackle both domain specificity and multilinguality, with a particular focus on news and social media. Our experimental setup includes 18 tasks, represented by 52 datasets covering Arabic, English, and Hindi. We demonstrate that LlamaLens outperforms the current state-of-the-art (SOTA) on 23 testing sets, and achieves comparable performance on 8 sets. We make the models and resources publicly available for the research community.1

Original languageEnglish
Title of host publication2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference Findings, NAACL 2025
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages5642-5664
Number of pages23
ISBN (Electronic)9798891761957
DOIs
Publication statusPublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 - Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

Name2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
Country/TerritoryUnited States
CityAlbuquerque
Period29/04/254/05/25

Fingerprint

Dive into the research topics of 'LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content'. Together they form a unique fingerprint.

Cite this