live speech recognition with Max ?

Julien Bayle's icon

hi there,
If CMU Sphinx seems the way to follow outside of Max, I didn't see (actually I didn't find) implementation for Max.

Is there one ?
Or should I work with a software outside of max that would pop detected words into Max with OSC or whatever ?

any ideas would be appreciate.

vichug's icon

there is mxj op.recognize, which is the only thing of the kind i'm aware of, and probably far behind CMU Sphinx

Julien Bayle's icon

hi vichug,
I just tested it. It is apparently based on Sphinx.
But it doesn't seem to work well and this is probably because I'm not expert and I don't use it well...

I think I'd delegate that part (speech to text) to something outside Max, actually.
Why not a pure Sphinx based app..
digging...

http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4 explains some basics.
maybe I should dig around processing for that...

Julien Bayle's icon
Joghurt's icon

deleted

diablodale's icon

My dp.kinect external supports speech recognition. hidale.com/dp-kinect/ It uses the Microsoft Kinect.
One of the features of the Kinect which not a lot of people know about...is a very very advanced array microphone system. It allows the Kinect to create a microphone sound beam. This can be used to focus the Kinect on specific areas or people and hear them at great distances.
dp.kinect then uses this Kinect microphone array and the Microsoft Kinect SDK to allow for speech recognition.
Here is the part of the dp.kinect wiki on that speech recognition https://github.com/diablodale/dp.kinect/wiki/Message-based-Data#speech-recognition

Julien Bayle's icon

This is really really interesting and it seems powerful !
Does it mean I can capture sound real-time and have dp.kinect object popping out ...text ?

diablodale's icon

Yes. In real-time and meters away from the Kinect. It is output as OSC or Max messages.

The default Kinect settings are tuned for command-response scenarios like "turn on" "increase volume". Commands where there are large pauses. You *can* change the dp.kinect attributes and make it respond much quicker with no pauses using attributes like @silenceprecise and @silencevague.

You can build simple or complex grammars using the GRXML standard. Speak in many languages like French, Spanish, Japanese, English, German, etc.

Julien Bayle's icon

sounds REALLY amazing !
Thanks a lot for your answer & work !!!

Federico Llach's icon

Hi @DiabloDale, this looks great. So is it only for Windows users? Any alternative for Mac?
Thanks

diablodale's icon

It is Windows only. If you want to use apple hardware, you could try using bootcamp.
Microsoft has released the software SDK technology only for Windows. I do not expect that to change anytime soon. Therefore dp.kinect and dp.kinect2 work only on Windows.