A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

  • Md Tahmid Rahman Laskar*
  • , Sawsan Alqahtani
  • , M. Saiful Bari*
  • , Mizanur Rahman
  • , Mohammad Abdullah Matin Khan
  • , Haidar Khan
  • , Israt Jahan
  • , Md Amran Hossen Bhuiyan
  • , Chee Wei Tan
  • , Md Rizwan Parvez
  • , Enamul Hoque
  • , Shafiq Joty*
  • , Jimmy Xiangji Huang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Citations (Scopus)

Abstract

Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the complexity of the evaluation process has led to varied evaluation setups, causing inconsistencies in findings and interpretations. To address this, we systematically review the primary challenges and limitations causing these inconsistencies and unreliable evaluations in various steps of LLM evaluation. Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages13785-13816
Number of pages32
ISBN (Electronic)9798891761643
DOIs
Publication statusPublished - 4 Jul 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period12/11/2416/11/24

Fingerprint

Dive into the research topics of 'A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations'. Together they form a unique fingerprint.

Cite this