Developing an English-Greek comparable corpus using web texts

Georgios Mikros, Villy Tsakona, Maria Drakopoulou, Alexandra Koutra, Evangelia Triantafylli, Sofia Trypanagnostopoulou

Research output: Other contributionpeer-review

Abstract

The goal of the paper is to present a project involving the compilation of comparable corpora including web texts in English and Greek. The project has been developed as part of a course in “Introduction to Bilingual Lexicography”, in the Interfaculty M.A. Programme “Lexicography: Theory and Applications” of the Faculty of English Studies.1 The development of the corpus aimed both at training prospective lexicographers in creating and using such resources in their work and at assisting them with projects assigned to them in the course of their post-graduate studies. In what follows, we will first provide the characteristics of comparable corpora and the advantages of their use in lexicography (section 2) and then present the details of the English-Greek Comparable Corpus (henceforth EGCC): the corpus size and content, the compilation procedure, and the metadata gathered for the texts included. Part of the corpus has been used to develop a quantitative method for judging content comparability between English and Greek texts (section 3). In particular, the medical subcorpus was used as a test bed in order to evaluate the suitability of the corpus as a linguistic resource for the extraction of bilingual terminology for lexicographical and educational uses (section 4). Section 5 summarizes the main findings of the study and discusses future prospects.
Original languageEnglish
Number of pages11
Publication statusPublished - 2008
Externally publishedYes

Fingerprint

Dive into the research topics of 'Developing an English-Greek comparable corpus using web texts'. Together they form a unique fingerprint.

Cite this