Abstract
The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus consists of 200 Modern Greek newswire articles written by two authors in two different topics. Many commonly used stylometric variables were calculated and for each one we performed a two-way ANOVA test, in order to estimate the main effects of author, topic and the interaction between them. The results showed that most of the variables exhibit considerable correlation with the text topic and their exploitation in authorship analysis should be done with caution.
| Original language | English |
|---|---|
| Pages (from-to) | 29-35 |
| Number of pages | 7 |
| Journal | CEUR Workshop Proceedings |
| Volume | 276 |
| Publication status | Published - 2007 |
| Externally published | Yes |
| Event | SIGIR 2007 International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection, PAN 2007 - Genoa, Italy Duration: 5 Dec 2007 → 5 Dec 2007 |
Keywords
- Authorship attribution
- Stylometry
- Topic-neutral features