NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

  • Md Arid Hasan*
  • , Maram Hasanain
  • , Fatema Ahmad
  • , Sahinur Rahman Laskar
  • , Sunaya Upadhyay
  • , Vrunda N. Sukhadia*
  • , Mucahid Kutlu
  • , Shammur Absar Chowdhury
  • , Firoj Alam
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Natural Question Answering (QA) datasets play a crucial role in evaluating the capabilities of large language models (LLMs), ensuring their effectiveness in real-world applications. Despite the numerous QA datasets that have been developed and some work has been done in parallel, there is a notable lack of a framework and large scale region-specific datasets queried by native users in their own languages. This gap hinders the effective benchmarking and the development of fine-tuned models for regional and cultural specificities. In this study, we propose a scalable, language-independent framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. We demonstrate the efficacy of the proposed framework by designing a multilingual natural QA dataset, MultiNativQA, consisting of ∼64k manually annotated QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers from 9 regions covering 18 topics. We benchmark open- and closed-source LLMs with the MultiNativQA dataset. We made the MultiNativQA dataset, and other experimental scripts publicly available for the community.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages14886-14909
Number of pages24
ISBN (Electronic)9798891762565
DOIs
Publication statusPublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

Fingerprint

Dive into the research topics of 'NativQA: Multilingual Culturally-Aligned Natural Query for LLMs'. Together they form a unique fingerprint.

Cite this