Sound command recognition

sakerdon's icon

Hello
Isn't there a way to control patch by voice or sound?
I tried to search through forum, but did not get much stuff.
I'd like to record a "command", which can be a word or sharp sound, and then have a possibility to recognize this command.
I tried to play around with jit.catch~ and jit.buffer~ but did not get anewhere.
I guess, the technic should be somehow similar to cv.jit learn and undergrade train and compare modes.
The signal volume can start recognition, the command will be 1-2 sec long.
thanks for any comments
Alex

Luis's icon

Hi Alex

To control patches with sound the common parameter is amplitude detection.
"average~"
Establish a threshold and after that flow the signal to a record command.
As any recognition is not 100% true but...

Have you ever think in controling that with a touch sensor?

Best

Aly

On Tue, Feb 10, 2009 at 4:04 PM, Alex wrote:

>
> Hello
> Isn't there a way to control patch by voice or sound?
> I tried to search through forum, but did not get much stuff.
> I'd like to record a "command", which can be a word or sharp sound, and
> then have a possibility to recognize this command.
> I tried to play around with jit.catch~ and jit.buffer~ but did not get
> anewhere.
> I guess, the technic should be somehow similar to cv.jit learn and
> undergrade train and compare modes.
> The signal volume can start recognition, the command will be 1-2 sec long.
> thanks for any comments
> Alex
> --
> soma!=sema
>

sakerdon's icon

Thanks Aly
well, it's clear. but I'd like to recognize several different commands, so the amplitude can start only "recording" or "recognizing" commands.
I'm looking for a possibility to compare recorded and received smthng (matrix, soundbuffer, etc.) and send accordingly bang to certain object.
the touch sensor does not work, I need to use it in live performance, where actors should just say command, which will change screen content. microphone is the easiest in that case.

Alex

sakerdon's icon

here is example of what I'm actually trying - the animated characters should "understand" few different commands
http://vimeo.com/3111435

Navid's icon

you can use speech regognition externals if you need the patch to
"understand" language from incoming voice.
try aka.listen for starters: http://www.iamas.ac.jp/~aka/max/#aka_listen

-Nav

On Tue, Feb 10, 2009 at 7:42 PM, Alex wrote:

>
> here is example of what I'm actually trying - the animated characters
> should "understand" few different commands
> http://vimeo.com/3111435
> --
> soma!=sema
>

sakerdon's icon

thanks Nav
this aka.listen works only for MAC, and I'm on Windows. well, I looked through forum and found that the story is not so simple.
it's strange - there are so many voice recognition apps around.
would be nice to have some object in Max, which could learn a couple of words and then compare them with what you say.
I guess it's much easier then speech recognition.
snthng like in cell phones. I don't need computer to understand people talking, just react to very few commands.

any other pointers?
Alex

JBR's icon

Hi guys,

I'm totally new here, so bear with my ignorance!

Could any of you either answer or lead me to where I could find an answer about speech recognition?
I've looked at the related discussions on the forum and it does not seem answer my questions.

I'm trying to get the program to answer back to peoples comments (vocal) and to link what people say with appropriate soundtracks and image projections. This would require that the participant' speech would be recorded/written directly (or imported) in MAX MSP, in real time. Now, the aka.listen object seems to be rather clumsy, and is hard to work with (does not recognize simple words easily).
Would you suggest that I work with another software like MacSpeech Dictate (or any other Mac compatible program), which can type directly in Word (and perhaps can it write in MAX MSP, I'm waiting for Nuance's answer on this) and input the writings in MAX so that I may associate sounds with other commands? Is it because I'm not using the aka.listen object properly? Does anyone know about MacSpeech Dictate? Do you know any easy way to achieve this or where I can find examples?

Thanks for your help!

JBR

seejayjames's icon

JBR wrote on Sun, 08 March 2009 14:53Hi guys,

I'm totally new here, so bear with my ignorance!

Could any of you either answer or lead me to where I could find an answer about speech recognition?
I've looked at the related discussions on the forum and it does not seem answer my questions.

I'm trying to get the program to answer back to peoples comments (vocal) and to link what people say with appropriate soundtracks and image projections. This would require that the participant' speech would be recorded/written directly (or imported) in MAX MSP, in real time. Now, the aka.listen object seems to be rather clumsy, and is hard to work with (does not recognize simple words easily).
Would you suggest that I work with another software like MacSpeech Dictate (or any other Mac compatible program), which can type directly in Word (and perhaps can it write in MAX MSP, I'm waiting for Nuance's answer on this) and input the writings in MAX so that I may associate sounds with other commands? Is it because I'm not using the aka.listen object properly? Does anyone know about MacSpeech Dictate? Do you know any easy way to achieve this or where I can find examples?

Thanks for your help!

JBR

Hi JBR, you might look at my previous post about Dragon Naturally Speaking, which I've had good luck with in Max. The drawback is that it's *too* accurate, that is, it has user files for each speaker, so it probably won't do well with general recognition of many different people. Still, it's a great program and I think it still has a time-limited demo. It will put text in any active text field in any application, including the textarea, text, and dialog boxes in Max. Probably the other apps will too. More details here:

Luke Hall's icon

I've used Macspeech Dictate in this way. In fact it uses the same speech recognition engine as Dragon Naturally Speaking, it works very well but you could potentially run into the same problems as CJ described above.

Another way to achieve this on a mac is using the built in voice recognition and applescripts and extra suites, which is an applescript extension that extends the range of what you can do, including letting you send key presses.

1. Turn on "speakable items" from system preferences > speech > speech recognition.
2. Open max.
3. Open script editor and write a script like this:

tell application "MaxMSP" to activate
tell application "Extra Suites"
ES type key "1"
end tell

4. Save it in library > speech > speakable items > application speakable items > maxmsp and name the file whatever you want the voice command to be, for example "press one"
6. Now on the floating speech icon click the down arrow at the bottom and "open speech commands window". With max as the front-most application check that the commands you just saved as applescripts have appeared in the maxmsp folder.
7. Now simply hook up a [key] object in max, press "escape" (or whichever key you have set up to turn speech recognition on) and say "press one" and you should have [key] spit out "49"!

Sorry about the length explanation I hope it makes sense to you and gives you another possible (and cheaper!) method of obtaining you goals.

Oh and the applescript extension can be downloaded from: http://www.kanzu.com/

lh

JBR's icon

Hi JBR, you might look at my previous post about Dragon Naturally Speaking, which I've had good luck with in Max. The drawback is that it's *too* accurate, that is, it has user files for each speaker, so it probably won't do well with general recognition of many different people. Still, it's a great program and I think it still has a time-limited demo. It will put text in any active text field in any application, including the textarea, text, and dialog boxes in Max. Probably the other apps will too. More details here:

[/quote]

Dear forum, seejayjames and lh,

Thanks a lot for your precious help, I really appreciate it!
Now I'll report on my progress with the speech recognition issue so far.

I did buy MacSpeech Dictate and I can say that it works great within MAX MSP. For some reason, it seems more stable when writing in Text Edit within MAX than in its on text box, go figure! One of the problems I immediately encountered is the cheap USB mic limitation. The program does come with a headset mic; but, if your goal is to do a sound installation like I'm doing, you'd prefer having an external mic. The problem is that the program does not seem to be able to recognize any other port but USB (I tried with the Firewire interface and to no avail)! Anyway, to fix that I got myself an XLR to USB adapter called Mic Mate Pro (made by MXL), and that device works fantastic. I can record very good quality sound directly to my laptop with it. It allows me to use the very good microphone that we have at school.

What I'm doing here, is to have MAcSpeech Dictate write the speech in Text Edit within MAX and then have aka.speech read that text, which will then be capture again by the mic, re-interpreted and re-written in MAX and on and on, until someone in the audience says something else. It's kind of similar to what Alvin Lucier did with his famous "I Am Sitting in a Room." I'm using the variations of interpretation inherent in an "untrained" speech Rec. software within a space where anyone can come and say anything in any language and still generate some interpretation by the program.

Now, the other problem I'm facing is the "aka.speech" object in itself. First of all, it seems to be a bit fidgety; today, I had to re-download the original object from the web; but it does work overall (don't hear me wrong, I'm grateful for this great tool). What's giving me trouble is to be able to redirect the aka object's speech to a 7.1 speaker array configuration; e.g., sending the sound to one or more speakers and alternate the sound spatial distribution using either the urn or random objects (or both).

If I was more of a techie, I'd design my own speech patch; but there might be a way to open the aka.speech object and direct the sound differently than the current stereo default. Do any of you guys have an idea about this?
Thanks again for your help!

JBR

seejayjames's icon

JBR wrote on Mon, 11 May 2009 15:13

What's giving me trouble is to be able to redirect the aka object's speech to a 7.1 speaker array configuration; e.g., sending the sound to one or more speakers and alternate the sound spatial distribution using either the urn or random objects (or both).

Can you get the aka.speech's output into Max using [adc~] by using the mic on the actual speaker output?

This is what you want to resample, correct? So the input from the mic would actually go to MacSpeech Dictate as well as Max, MacSpeech would figure out the text (as it gradually deteriorates) and put it into your textedit, while the audio would then be MSP signals which you can route easily to your setup, plus have whatever level of control/automation of the volumes in each channel (plus effects?... hehe) that you want.

No need to be a Max sound techhie (though I do hope you will become one). If your hardware is all working you can set up a control patch easily:

For 7.1 make 8 gain~ faders and hook each one up to a [dac~ 1 2 3 4 5 6 7 8] for each channel. (make sure the 8-channel sound card is recognized by Max and selected in the DSP status window). Also run each gain~ output through its own *~ object (before the dac~ actually) then hook up one master slider/number box with output 0. to 1. to all of the *~ boxes. This will let you do fades of all 8 channels at once while preserving their relative levels set by the 8 gains~. it will be handy.

Easiest way to start experimenting with different speaker levels is to hook up a [preset] to all 8 gains~ and start saving some configurations of your relative levels. The gain~ inspector or the right inlet will let you set the fade time, which is super-short by default. So if you set them to longer, even a couple seconds or more, you'll get smooth transitions between your presets. If you like what you're hearing with these controls, go further and look into [mtr] to record live automation, then look into [pattr] for preset-with-interpolation, powerful stuff. Plus there are tons of other ways to control or generate levels, which are just number streams... Max's forte.

If you have a MIDI controller with knobs or faders this is a great time to use it to control your volume levels, it's easy with [ctlin].

One thought... not sure how the possibility for bad audio feedback is with this arrangement. I know feedback is an important part of the process, for the resampling, but use caution with where you aim and how hot you set the mic... possibly you can rig up something where the 7.1 for the audience isn't picked up much by the mic, instead use a smaller monitor-type speaker near the computer whereby you could control the level/placement easily. Might not be an issue though.

Enough of the technical details, really like the project idea and I hope it turns out great! I tried another experiment sort of like this (much simpler though) where I had Dragon Naturally Speaking running and I spoke whatever kinds of gibberish I could think of into it. Since it's got to come up with some string of real words to match, I got some really random and often hilarious results. I'd love to hear what it does with foreign languages too....sfrecord~!

5mg3's icon

Sorry Luke being a total n00b I do not fully follow!

What is the script editor (i feel really dumb asking this for some reason, i bet this is so obvious!)?

Then, how do I tell application "MaxMSP" to activate? Is this just 'Max' itself?

From then, please explain in more detail the following also (sorry);

tell application "Extra Suites"
ES type key "1"
end tell

Luke Hall's icon

Download the "Extra Suites" from the link in my original post. Then in Applications > Applescript open the "Script Editor" program. Paste in the four lines of code and then save it as explained above. If you want multiple spoken commands you'll need to duplicate the file and change the line:

ES type key "1"

replacing the 1 with another number or key you want to be triggered by the spoken command. If you follow the rest of the instructions you should be good to go. If you have any more problems or this doesn't explain it let me know.

lh

5mg3's icon

Forgive my ignorance Luke but every time I follow the instruction you have posted I get this response from the comp;

The document “Untitled” could not be saved as “press one”.

?? Can you shed any light on what I may be doing wrong?

Cheers man,
a

Steve Belovarich's icon

In an attempt to make speech recognition using the built in option on Mac and I was successful. Thank you Luke Hall for the details. I thought I could just trigger the speech recog via a key press but it seems OS X locks out the key from any other Application. For instance, if I choose ESC in Speech Preferences, the key object in Max does not recognize the ESC key has been pressed. I would like Max to trigger the speech recog in addition to having the Speech Recog trigger something in Max. So, I looked into the aka.keyboard object but it faces the same limitation, the object cannot make the same key press as the speech recog is set to.

Any suggestions? I thought of having Max trigger an Applescript that has all the commands built in for the Speech recognition server. Perhaps I'll just have the Applescript running alongside Max, passing it key presses from recognized words.

seejayjames's icon

If you have any MIDI controllers hooked up, the commands from these won't be limited on a per-app basis. So regardless of which app is active, you can get the MIDI commands heard by Max.

Failing that, there are some java solutions, these allow more flexibility with key presses and mouse control (even into the danger zones of control keys and auto-clicking...)

Curious to hear how the sound-command recognition is coming along!

Luke Hall's icon

I just checked this out. You can do it with the wonderful [mxj autobot] that was posted to the forum a while back. If you leave the escape key as the listening key in speech preferences then you can send "keydown 27" and "keyup 27" messages to [mxj autobot] in max and it will turn the speech command listening on and off programatically, awesome!

lh

Steve Belovarich's icon

Luke,

You just made my day. Thank you!

Here is a post that links to mxj.autobot:

Steve

Steve Belovarich's icon

After some testing, this may not work for what I was hoping. I am creating an installation where the participant speaks into a microphone. I want to analyze the recording and produce visuals based on certain keywords. Apple's voice recog. doesn't understand keywords in the middle of sentences. Bummer.

Macciza's icon

I am pretty sure speech rec has been done by porting an opensource java speech prog to Max,
I will see if I can find anything more in my archives . . .

Rob Ramirez's icon

however, i sort of doubt you will ever get it to recognize commands in the middle of someone's sentence. that's not really what they were designed for.

Macciza's icon

Hi
check out op.recognise - Community/Project 35 - It should be able to help you out . . .
And you would be surprised what can be achieved these days with voice recognition.
Getting a particular word out should not be difficult, easier than working out emotional context . . .

osc2's icon

Hi Guys,

I am doing a research on speech recognition, am still novice.
This thread seems to be really useful.

I cannot find 'extra suites'. can anybody please suggest an alternative for that, or from where can i get 'extra suites' now?

Thanks

Yoann's icon