Abstract
This paper presents a Named Entity Recognition (NER) system on broadcast news transcription which is a combination of two different classifiers. In addition, we present a comparative analysis of the results obtained by extracting Named Entities from two different types of documents: written documents and spoken documents. Written documents are documents in which text appears as standard written form e.g. newspaper articles. Spoken (transcribed) documents are the documents where orthographic information and punctuation are missing. In transcribed documents, an absence of these two main features often causes a drop in performances to recognize Named Entities (NEs). An additional error in the transcription made by the Automatic Speech Recognition (ASR) system is that it is not able to recognize the right sequence of words. This also introduces additional performance reduction of NER. The system performed the best on the task of Italian NER at Evalita 2011 with F1 of 63.50%. Obtained results of this study are going to be considered for integration into Typhoon [3], a NER system developed by HTL group at FBK, to deal with transcribed broadcast news too.
| Original language | English |
|---|---|
| Publication status | Published - 2011 |
| Externally published | Yes |
| Event | International Workshop on Evaluation of Natural Language and Speech Tools for Italian, EVALITA 2011 - Rome, Italy Duration: 24 Jan 2012 → 25 Jan 2012 |
Conference
| Conference | International Workshop on Evaluation of Natural Language and Speech Tools for Italian, EVALITA 2011 |
|---|---|
| Country/Territory | Italy |
| City | Rome |
| Period | 24/01/12 → 25/01/12 |