Saturday, July 14, 2018
Face Recognizers Bloom filters and Application to Speech Recognition
Face Recognizers Bloom filters and Application to Speech Recognition
Some time ago I went into reading the core face detection paper by Viola and Jones about Haar cascades for object detection. It struck me that their method which appeared to be very fruitful in face and object detection didnt get into common practice in speech recognition.
Basically the idea of their method is that its possible to reduce search space significantly with very weak set of classifiers. For example you can easily find out that there is no face on the green grass and thus you can skip this region. This is rather fruitful idea that you can classify negatives much more accurately then positives. Putting things into cascade make search space tiny and recognition fast and efficient. Certainly its not the only algorithm of this type, other one I met recently is bloom filters with almost the same method for efficient hash search.
The transfer of this into ASR is rather straightforward. We need to train weak classifiers that reject phone hypothesis for a given set of frames. Thats actually quite easy with SVM or something built on top of existing HMM segmentation. Next, we could also apply this to a language model and reject some hypothesis which arent possible in the language.
I havent seen any papers on that, probably I need to search more. This idea is certainly worth to try and it should get into common ASR practices like discriminative training, adaptation with linear regression or multipass search.