live speech recognition with Max ?
hi there,
If CMU Sphinx seems the way to follow outside of Max, I didn't see (actually I didn't find) implementation for Max.
Is there one ?
Or should I work with a software outside of max that would pop detected words into Max with OSC or whatever ?
any ideas would be appreciate.
there is mxj op.recognize, which is the only thing of the kind i'm aware of, and probably far behind CMU Sphinx
hi vichug,
I just tested it. It is apparently based on Sphinx.
But it doesn't seem to work well and this is probably because I'm not expert and I don't use it well...
I think I'd delegate that part (speech to text) to something outside Max, actually.
Why not a pure Sphinx based app..
digging...
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4 explains some basics.
maybe I should dig around processing for that...
deleted
My dp.kinect external supports speech recognition. hidale.com/dp-kinect/ It uses the Microsoft Kinect.
One of the features of the Kinect which not a lot of people know about...is a very very advanced array microphone system. It allows the Kinect to create a microphone sound beam. This can be used to focus the Kinect on specific areas or people and hear them at great distances.
dp.kinect then uses this Kinect microphone array and the Microsoft Kinect SDK to allow for speech recognition.
Here is the part of the dp.kinect wiki on that speech recognition https://github.com/diablodale/dp.kinect/wiki/Message-based-Data#speech-recognition
This is really really interesting and it seems powerful !
Does it mean I can capture sound real-time and have dp.kinect object popping out ...text ?
Yes. In real-time and meters away from the Kinect. It is output as OSC or Max messages.
The default Kinect settings are tuned for command-response scenarios like "turn on" "increase volume". Commands where there are large pauses. You *can* change the dp.kinect attributes and make it respond much quicker with no pauses using attributes like @silenceprecise and @silencevague.
You can build simple or complex grammars using the GRXML standard. Speak in many languages like French, Spanish, Japanese, English, German, etc.
sounds REALLY amazing !
Thanks a lot for your answer & work !!!
Hi @DiabloDale, this looks great. So is it only for Windows users? Any alternative for Mac?
Thanks
It is Windows only. If you want to use apple hardware, you could try using bootcamp.
Microsoft has released the software SDK technology only for Windows. I do not expect that to change anytime soon. Therefore dp.kinect and dp.kinect2 work only on Windows.