TY - GEN
T1 - MEENA (PersianMMMU)
T2 - 19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026
AU - Ghahroodi, Omid
AU - Hemmat, Arshia
AU - Nouri, Marzia
AU - Hosseini, Seyed Mohammad Hadi
AU - Dastgheib, Doratossadat
AU - Sanian, Mohammad Vali
AU - Sahebi, Alireza
AU - Zohrabi, Reihaneh
AU - Rohban, Mohammad Hossein
AU - Asgari, Ehsaneddin
AU - Baghshah, Mahdieh Soleymani
N1 - Publisher Copyright:
©2026 Association for Computational Linguistics.
PY - 2026
Y1 - 2026
N2 - Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model’s ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.
AB - Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model’s ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.
UR - https://www.scopus.com/pages/publications/105039013177
U2 - 10.18653/v1/2026.findings-eacl.340
DO - 10.18653/v1/2026.findings-eacl.340
M3 - Conference contribution
AN - SCOPUS:105039013177
T3 - 19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026
SP - 6457
EP - 6491
BT - 19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026
PB - Association for Computational Linguistics (ACL)
Y2 - 24 March 2026 through 29 March 2026
ER -