TY - GEN
T1 - CHARTQAPRO
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Masry, Ahmed
AU - Islam, Mohammed Saidul
AU - Ahmed, Mahir
AU - Bajaj, Aayush
AU - Kabir, Firoz
AU - Kartha, Aaryaman
AU - Laskar, Md Tahmid Rahman
AU - Rahman, Mizanur
AU - Rahman, Shadikur
AU - Shahmohammadi, Mehrad
AU - Thakkar, Megh
AU - Parvez, Md Rizwan
AU - Hoque, Enamul
AU - Joty, Shafiq
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like ChartQA lack real-world diversity and have recently shown performance saturation with modern large vision-language models (LVLMs). To address these limitations, we introduce CHARTQAPRO, a new benchmark that includes 1,341 charts from 99 diverse sources, spanning various chart types-including info-graphics and dashboards-and featuring 1,948 questions in various types, such as multiple-choice, conversational, hypothetical, and unanswerable questions, to better reflect real-world challenges. Our evaluations with 21 models show a substantial performance drop for LVLMs on CHARTQAPRO; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on CHARTQAPRO, underscoring the complexity of chart reasoning. We complement our findings with detailed error analyses and ablation studies, identifying key challenges and opportunities for advancing LVLMs in chart understanding and reasoning. We release CHARTQAPRO at https://github.com/visnlp/ChartQAPro.
AB - Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like ChartQA lack real-world diversity and have recently shown performance saturation with modern large vision-language models (LVLMs). To address these limitations, we introduce CHARTQAPRO, a new benchmark that includes 1,341 charts from 99 diverse sources, spanning various chart types-including info-graphics and dashboards-and featuring 1,948 questions in various types, such as multiple-choice, conversational, hypothetical, and unanswerable questions, to better reflect real-world challenges. Our evaluations with 21 models show a substantial performance drop for LVLMs on CHARTQAPRO; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on CHARTQAPRO, underscoring the complexity of chart reasoning. We complement our findings with detailed error analyses and ablation studies, identifying key challenges and opportunities for advancing LVLMs in chart understanding and reasoning. We release CHARTQAPRO at https://github.com/visnlp/ChartQAPro.
UR - https://www.scopus.com/pages/publications/105028645316
U2 - 10.18653/v1/2025.findings-acl.978
DO - 10.18653/v1/2025.findings-acl.978
M3 - Conference contribution
AN - SCOPUS:105028645316
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 19123
EP - 19151
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -