Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Punctuation restoration is a common post-processing problem for Automatic SpeechRecognition (ASR) systems. It is important to improve the readability of the transcribed text for the human reader and facilitate NLP tasks. Current state-of-art address this problem using different deep learning models. Recently, transformer models have proven their success in downstream NLP tasks, and these models have been explored very little for the punctuation restoration problem. In this work, we explore different transformer based models and propose an augmentation strategy for this task, focusing on high-resource (English) and low-resource (Bangla) languages. For English, we obtain comparable state-of-the-art results, while for Bangla, it is the first reported work, which can serve as a strong baseline for future work. We have made our developed Bangla dataset publicly available for the research community.

Original languageEnglish
Title of host publicationProceedings of the 2020 EMNLP Workshop W-NUT: The Sixth Workshop on Noisy User-generated Text
Pages132-142
Publication statusPublished - Nov 2020

Fingerprint

Dive into the research topics of 'Punctuation Restoration using Transformer Models for High-and Low-Resource Languages'. Together they form a unique fingerprint.

Cite this