Abstract
This study aims to develop an effective and precise methodology for detecting AI-generated text, leveraging the synergistic combination of transformer learning and stylometric features. The research utilized two datasets provided by the AuTexTification: Automated Text Identification shared task, a component of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum held at the SEPLN 2023 Conference. Our team engaged in both English language subtasks, which included binary classification of texts as either human or AI-generated and multiclass classification to predict the specific AI writing model employed from a selection of six. Our main approach was to experiment with multiple Transformer models and, at the same time, to use an extensive stylometric feature engineering workflow. Each method (transformers and stylometric features) was first applied separately, and then we explored various ways to combine them. The most efficient method was based on ensemble learning utilizing majority voting employing the two most accurate transformer models in our training data and a comprehensive combined concatenation of many different stylometric feature groups. The macro-F1 scores on the test sets on subtasks 1 and 2 were 60.78 and 55.87, respectively, positioning our group above the median of the competing teams. This study underscores the potential of combining transformer learning and stylometric features to enhance the accuracy of AI-generated text detection.
| Original language | English |
|---|---|
| Journal | CEUR Workshop Proceedings |
| Volume | 3496 |
| Publication status | Published - Sept 2023 |
| Event | 2023 Iberian Languages Evaluation Forum, IberLEF 2023 - Jaen, Spain Duration: 26 Sept 2023 → … |
Keywords
- AI-writing detection
- ensemble learning
- stylometry
- transformers
Fingerprint
Dive into the research topics of 'AI-Writing Detection Using an Ensemble of Transformers and Stylometric Features'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver