Abstract
This study explores the feasibility of cross-linguistic authorship attribution and the author's gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google's Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author's Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.
| Original language | English |
|---|---|
| Pages (from-to) | 954-967 |
| Number of pages | 14 |
| Journal | Digital Scholarship in the Humanities |
| Volume | 39 |
| Issue number | 3 |
| Early online date | Jun 2024 |
| DOIs | |
| Publication status | Published - 5 Jun 2024 |
Keywords
- Authors' Multilevel N-gram Profiles
- Machine Translation
- author profiling
- authorship attribution
- lexical diversity
- multilingual word embeddings
Fingerprint
Dive into the research topics of 'Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver