Far-field automatic speech recognition (ASR) is challenging, mainly attributed to the high reverberation in the recordings. A novel linear sparse prediction model has been proposed to estimate and suppress reverberation. This model considers reverberation as a mixture of early and late reflections of the direct signal and estimates the late reflection with Lasso. It has been demonstrated that this approach is promising in improving perceptual intelligibility, however it is unknown if the improvement can be propagated to ASR tasks.
This paper applies the Lasso-based dereverberation approach to far-field speech recognition, and shows that it can deliver significant performance improvement for ASR based on deep neural networks (DNN). Particularly, we demonstrated that an utterance-based Lasso is sufficient to obtain good performance, which is important for applying the Lasso-based dereverberation to real-time ASR systems.