Word length, word frequencies and Zipf's law in the Greek language

Nick Hatzigeorgiu, George Mikros, George Carayannis

Research output: Contribution to journalArticlepeer-review

39 Citations (Scopus)

Abstract

The aim of this paper is to report for the first time the 1000 most common words and lemmas of Modern Greek and some of their quantitative characteristics. The frequency word list produced is based on the Hellenic National Corpus (HNC), a corpus of Modern Greek language consisting of about 13 million words of written texts. In particular, we investigate the application of Zipf's law in both the 1000 most common words and lemmas. In addition we examine the frequency distribution of the grammatical categories in the 1000 most common words and lemmas as well as the average word length in the whole HNC and the growth of the average word length as a function of the number of the most common words.

Original languageEnglish
Pages (from-to)175-185
Number of pages11
JournalJournal of Quantitative Linguistics
Volume8
Issue number3
DOIs
Publication statusPublished - 9 Aug 2010
Externally publishedYes

Fingerprint

Dive into the research topics of 'Word length, word frequencies and Zipf's law in the Greek language'. Together they form a unique fingerprint.

Cite this