Abstract—An effective voice activity detection (VAD) algorithm
is proposed for improving speech recognition performance in
noisy environments. The approach is based on the determination
of the speech/nonspeech divergence by means of specialized order
statistics filters (OSFs) working on the subband log-energies.
This algorithm differs from many others in the way the decision
rule is formulated. Instead of making the decision based on the
current frame, it uses OSFs on the subband log-energies which
significantly reduces the error probability when discriminating
speech from nonspeech in a noisy signal. Clear improvements
in speech/nonspeech discrimination accuracy demonstrate the
effectiveness of the proposed VAD. It is shown that an increase of
the OSF order leads to a better separation of the speech and noise
distributions, thus allowing a more effective discrimination and
a tradeoff between complexity and performance. The algorithm
also incorporates a noise reduction block working in tandem with
the VAD and showed to further improve its accuracy. A previous
noise reduction block also improves the accuracy in detecting
speech and nonspeech. The experimental analysis carried out on
the AURORA databases and tasks provides an extensive performance
evaluation together with an exhaustive comparison to the
standard VADs such as ITU G.729, GSM AMR, and ETSI AFE for
distributed speech recognition (DSR), and other recently reported
VADs.
Index Terms—Noise reduction, robust speech recognition,
speech/nonspeech detection, subband order statistics filters.