TY - GEN
T1 - GenAI Content Detection Task 3
T2 - 1st Workshop on GenAI Content Detection, GenAIDetect 2025
AU - Dugan, Liam
AU - Zhu, Andrew
AU - Alam, Firoj
AU - Nakov, Preslav
AU - Apidianaki, Marianna
AU - Callison-Burch, Chris
N1 - Publisher Copyright:
© 2025 International Conference on Computational Linguistics.
PY - 2025/1/15
Y1 - 2025/1/15
N2 - Recently there have been many shared tasks targeting the detection of generated text from Large Language Models (LLMs). However, these shared tasks tend to focus either on cases where text is limited to one particular domain or cases where text can be from many domains, some of which may not be seen during test time. In this shared task, using the newly released RAID benchmark, we aim to answer whether or not models can detect generated text from a large, yet fixed, number of domains and LLMs, all of which are seen during training. Over the course of three months, our task was attempted by 9 teams with 23 detector submissions. We find that multiple participants were able to obtain accuracies of over 99% on machine-generated text from RAID while maintaining a 5% False Positive Rate-suggesting that detectors are able to robustly detect text from many domains and models simultaneously. We discuss potential interpretations of this result and provide directions for future research.
AB - Recently there have been many shared tasks targeting the detection of generated text from Large Language Models (LLMs). However, these shared tasks tend to focus either on cases where text is limited to one particular domain or cases where text can be from many domains, some of which may not be seen during test time. In this shared task, using the newly released RAID benchmark, we aim to answer whether or not models can detect generated text from a large, yet fixed, number of domains and LLMs, all of which are seen during training. Over the course of three months, our task was attempted by 9 teams with 23 detector submissions. We find that multiple participants were able to obtain accuracies of over 99% on machine-generated text from RAID while maintaining a 5% False Positive Rate-suggesting that detectors are able to robustly detect text from many domains and models simultaneously. We discuss potential interpretations of this result and provide directions for future research.
UR - https://www.scopus.com/pages/publications/105000202491
U2 - 10.48550/arXiv.2501.08913
DO - 10.48550/arXiv.2501.08913
M3 - Conference contribution
AN - SCOPUS:105000202491
T3 - Proceedings - International Conference on Computational Linguistics, COLING
SP - 377
EP - 388
BT - GenAIDetect 2025 - Proceedings of the 1st Workshop on GenAI Content Detection, Proceedings of the Workshop - 31st International Conference on Computational Linguistics, COLING 2025
A2 - Alam, Firoj
A2 - Nakov, Preslav
A2 - Habash, Nizar
A2 - Gurevych, Iryna
A2 - Gurevych, Iryna
A2 - Chowdhury, Shammur
A2 - Shelmanov, Artem
A2 - Wang, Yuxia
A2 - Artemova, Ekaterina
A2 - Kutlu, Mucahid
A2 - Mikros, George
PB - Association for Computational Linguistics (ACL)
Y2 - 19 January 2025
ER -