Abstract
This study proposes a new non-intrusive measure of speech quality, the neurogram speech quality measure (NSQM), based on the responses of a biologically-inspired computational model of the auditory system for listeners with normal hearing. The model simulates the responses of an auditory-nerve fiber with a characteristic frequency to a speech signal, and the population response of the model is represented by a neurogram (2D time-frequency representation). The responses of each characteristic frequency in the neurogram were decomposed into sub-bands using 1D discrete Wavelet transform. The normalized energy corresponding to each sub-band was used as an input to a support vector regression model to predict the quality score of the processed speech. The performance of the proposed non-intrusive measure was compared to the results from a range of intrusive and non-intrusive measures using three standard databases: the EXP1 and EXP3 of supplement 23 to the P series (P.Supp23) of ITU-T Recommendations and the NOIZEUS databases. The proposed NSQM achieved an overall better result over most of the existing metrics for the effects of compression codecs, additive and channel noises.
| Original language | English |
|---|---|
| Pages (from-to) | 260-279 |
| Number of pages | 20 |
| Journal | Computer Speech and Language |
| Volume | 58 |
| DOIs | |
| Publication status | Published - Nov 2019 |
| Externally published | Yes |
Keywords
- Auditory-nerve model
- Discrete Wavelet transform
- Neurogram
- PESQ
- POLQA
- Speech quality assessment