Abstract
Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks—textual, API-based, and visual reasoning— through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, Gemini-1.5-Pro, none surpasses 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation. All the resources are available on the project website.
| Original language | English |
|---|---|
| Pages (from-to) | 13774-13813 |
| Number of pages | 40 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 267 |
| Publication status | Published - Jul 2025 |
| Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 |
Fingerprint
Dive into the research topics of 'MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver