Comparing named entity recognition on transcriptions and written texts

Firoj Alam, Bernardo Magnini*, Roberto Zanoli

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.

Original languageEnglish
Pages (from-to)71-89
Number of pages19
JournalStudies in Computational Intelligence
Volume589
DOIs
Publication statusPublished - 2015
Externally publishedYes

Keywords

  • Automatic transcriptions
  • Entity detection
  • Named entity recognition
  • Written texts

Fingerprint

Dive into the research topics of 'Comparing named entity recognition on transcriptions and written texts'. Together they form a unique fingerprint.

Cite this