TY - GEN
T1 - Multilingual Word Error Rate Estimation
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Chowdhury, Shammur Absar
AU - Ali, Ahmed
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework - eWER3 - jointly trained on acoustic and lexical representation to estimate word error rate. We demonstrate the effectiveness of eWER3 to (i) predict WER without using any internal states from the ASR and (ii) use the multilingual shared latent space to push the performance of the close-related languages. We show our proposed multilingual model outperforms the previous monolingual word error rate estimation method (eWER2) by an absolute 9% increase in Pearson correlation coefficient (PCC), with better overall estimation between the predicted and reference WER.
AB - The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework - eWER3 - jointly trained on acoustic and lexical representation to estimate word error rate. We demonstrate the effectiveness of eWER3 to (i) predict WER without using any internal states from the ASR and (ii) use the multilingual shared latent space to push the performance of the close-related languages. We show our proposed multilingual model outperforms the previous monolingual word error rate estimation method (eWER2) by an absolute 9% increase in Pearson correlation coefficient (PCC), with better overall estimation between the predicted and reference WER.
KW - End-to-End systems
KW - Multilingual WER estimation
UR - https://www.scopus.com/pages/publications/85174878749
U2 - 10.1109/ICASSP49357.2023.10095888
DO - 10.1109/ICASSP49357.2023.10095888
M3 - Conference contribution
AN - SCOPUS:85174878749
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 June 2023 through 10 June 2023
ER -