TY - JOUR
T1 - Comparing named entity recognition on transcriptions and written texts
AU - Alam, Firoj
AU - Magnini, Bernardo
AU - Zanoli, Roberto
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.
AB - The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.
KW - Automatic transcriptions
KW - Entity detection
KW - Named entity recognition
KW - Written texts
UR - https://www.scopus.com/pages/publications/84926640994
U2 - 10.1007/978-3-319-14206-7_4
DO - 10.1007/978-3-319-14206-7_4
M3 - Article
AN - SCOPUS:84926640994
SN - 1860-949X
VL - 589
SP - 71
EP - 89
JO - Studies in Computational Intelligence
JF - Studies in Computational Intelligence
ER -