TY - JOUR
T1 - Overview of the CLEF-2025 CheckThat! Lab Task 1 on Subjectivity in News Articles
AU - Ruggeri, Federico
AU - Muti, Arianna
AU - Korre, Katerina
AU - Struß, Julia Maria
AU - Siegel, Melanie
AU - Wiegand, Michael
AU - Alam, Firoj
AU - Biswas, Md Rafiul
AU - Zaghouani, Wajdi
AU - Nawrocka, Maria
AU - Ivasiuk, Bogdan
AU - Razvan, Gogu
AU - Mihail, Andreiana
N1 - Publisher Copyright:
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2025
Y1 - 2025
N2 - We present an overview of Task 1 of the eighth edition of the CheckThat! lab at the 2025 edition of the Conference and Labs of the Evaluation Forum (CLEF). The task required participants to determine whether individual sentences from news articles expressed subjective viewpoints, such as opinions or personal bias, or presented objective, fact-based information. The task was offered in nine languages: Arabic, Bulgarian, English, German, Italian, Greek, Polish, Romanian, and Ukrainian, as well as in a multilingual setting. We curated datasets for each language, comprising roughly 14,000 sentences sourced from diverse news outlets. Participants were tasked with developing classification systems to identify subjectivity (personal opinions or biases) and objectivity (factual information) at the sentence level. A total of 22 teams participated in the task, submitting 436 valid runs across all language tracks. Most systems were based on transformer models, with approaches ranging from fine-tuning language-specific and multilingual encoders to applying English-centric models in combination with machine translation. Several teams also experimented with ensemble techniques, handcraffied features, and in-context learning using large language models. Systems were evaluated using macro-averaged F1 score to ensure equal weighting of subjective and objective classes. Performance varied considerably by language: German, Italian, English and Romanian yielded the highest results. In contrast, Greek and Ukrainian emerged as the most challenging languages, with no team surpassing the 0.65 and 0.51 F1 score marks, respectively. Task 1 offers a valuable benchmark for the development and evaluation of multilingual subjectivity detection systems. This paper presents an overview of Task 1, including datasets, system strategies, and outcomes, contributing to broader research efforts aimed at improving the transparency and trustworthiness of automated content analysis.
AB - We present an overview of Task 1 of the eighth edition of the CheckThat! lab at the 2025 edition of the Conference and Labs of the Evaluation Forum (CLEF). The task required participants to determine whether individual sentences from news articles expressed subjective viewpoints, such as opinions or personal bias, or presented objective, fact-based information. The task was offered in nine languages: Arabic, Bulgarian, English, German, Italian, Greek, Polish, Romanian, and Ukrainian, as well as in a multilingual setting. We curated datasets for each language, comprising roughly 14,000 sentences sourced from diverse news outlets. Participants were tasked with developing classification systems to identify subjectivity (personal opinions or biases) and objectivity (factual information) at the sentence level. A total of 22 teams participated in the task, submitting 436 valid runs across all language tracks. Most systems were based on transformer models, with approaches ranging from fine-tuning language-specific and multilingual encoders to applying English-centric models in combination with machine translation. Several teams also experimented with ensemble techniques, handcraffied features, and in-context learning using large language models. Systems were evaluated using macro-averaged F1 score to ensure equal weighting of subjective and objective classes. Performance varied considerably by language: German, Italian, English and Romanian yielded the highest results. In contrast, Greek and Ukrainian emerged as the most challenging languages, with no team surpassing the 0.65 and 0.51 F1 score marks, respectively. Task 1 offers a valuable benchmark for the development and evaluation of multilingual subjectivity detection systems. This paper presents an overview of Task 1, including datasets, system strategies, and outcomes, contributing to broader research efforts aimed at improving the transparency and trustworthiness of automated content analysis.
KW - fact-checking
KW - misinformation detection
KW - subjectivity classification
UR - https://www.scopus.com/pages/publications/105019038240
M3 - Conference article
AN - SCOPUS:105019038240
SN - 1613-0073
VL - 4038
SP - 681
EP - 694
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
Y2 - 9 September 2025 through 12 September 2025
ER -