op.recognize

Speech recognition and segmentation. Recognize is a speech to text based on Sphinx4. It translates the incoming audio signal into text and can give dates of words and phonemes in a buffer~. Uses JSGF grammar file and ngram model languages. No special voice learning is needed and potentially multilingual.

Olivier Pasquet

The right link is:
http://www.opasquet.fr/dl/recognize.zip

spacewülf

o nice! thank you so much for working on this project! i was going to try to implement an earlier speech to text program i had found (but never tested) in a performance piece coming in april. I'll get into this in march, and i'll definitely link you the results.
thank you so much!

Olivier Pasquet

There were too many dl from my website. You can now download it from :
http://rapidshare.com/files/344796157/recognize.zip

pasquet

new update :
http://rapidshare.com/files/350360656/recognize.zip

Gilson

Hi pasquet!
Thanks for your implements.

I've installed op.recognize and when i try to load a text file, the message "op.recognize-> error allocating" shows up in the Max window. What's going on?

I couldn't solve this problem, maybe you know what is wrong ;-(

regards

Olivier Pasquet

Hello !

I wrote this error message when you have a mistake in the loading of data.
It seems you are using the right extension files.

Make sure you have installed "sphinx4max.jar" AND "jsapi.jar" in your Max java/lib folder.
If yes, what did you write in the gram textfile you loaded ? You should have a look indide the digits.gram example. Maybe the problem is there.
Another question : are you using the example file or did you make a new one ?

best,

Olivier Pasquet

New version with bug fixes and more efficiency :
http://rapidshare.com/files/358682924/recognize.zip

Gokce Kinayoglu

Hi Olivier,
Wonderful and very useful implementation!
I'm experiencing the same problem as Gilson. Getting "op.recognize-> error allocating" message when I try to load the example .gram and .lm files. This happens on a Windows system. I also tested on Mac and everything seems to work fine there. Any ideas?

Thanks,

Olivier Pasquet

hello,

Oohoh really ?
I'll quickly have a look if there is a problem on Windows.

Are you sure you did put the jar file into the lib folder and if you accepted the SUN conditions running the shell script ?
You can find the installation information in the read_me.txt file.

Rob

I am having trouble unpacking it? I have windows so this terminal.app thing doesnt apply, I've moved the files to the classes and lib folder in the java folder....

Now what dooo I do? :(
Kindest Regards,

Rob

I've got a little further, what is this error?

(mxj~) Class op.recognize is not a subclass of com/cycling74/msp/MSPObject

batman2011

There is an error in the max window saying "wrong arguments"... ?

Olivier Pasquet

Hello Rob,

You are using mxj~ and not mxj. I guess it is the reason.

best

Olivier Pasquet

Hello batman,

You probably do not use the right arguments. Do you have this problem when you load the help patch ?

best

batman2011

Thank you for your reply, I am attempting to make an installment for a university project! It says that there is no helpfile for op.recognize when I press "alt" and click the mxj op.recognize object. In-fact, when I type "mxj op.recognize", in the max window it says straight away "wrong arguments" before I have even done anything!

Regards

hadowa

Hey Olivier!

First of all, you did a great job on that object. It works very well.

Is it possible to make the speech recognition in real time? I need it to sync a text written behind an actor with the actors voice.

Also, I am german. Do you know where I can find a german dictionary to feed your object with?

kind regards,
Heiko

osc2

Hi Olivier,

This is indeed a great work, and a big resource.
I have installed and am not able to make it work.

How do I give a voice input ?
I am a novice, just started my research on speech recognition.

It'll be really helpful if you can tell me how exactly this works.

Thanks.

Olivier Pasquet

> Is it possible to make the speech recognition in real time? I need it to sync a text written behind an actor with the actors voice.< yes and no. :)
Since it is not a continuous audio stream, you have to find a way to stop the recording, then do the process. I did used 2 buffers in parallel. The most difficult part of the job is finding the right moment when to stop the recording. Envelope follower ?

Olivier Pasquet

> How do I give a voice input ?< You have to use buffer~ and record~. record~ would record the audio to the buffer~. When you have finished recording, you stop the recording and bang the object.
I hope this helps.

Olivier Pasquet

> In-fact, when I type “mxj op.recognize”, in the max window it says straight away “wrong arguments” before I have even done anything!< You need to give it at least one argument; the buffer name to which it is related.

metameta

I'm getting op.recognize-> error allocating when trying to load any of the example language files. I am on windows. Copied the right files to the classes and lib dirs. Anyone figure this out yet?

This looks good right?
MXJ System CLASSPATH:
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\lib\jitter.jar
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\lib\jode-1.1.2-pre-embedded.jar
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\lib\jsapi.jar
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\lib\max.jar
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\lib\sphinx4max.jar
MXJClassloader CLASSPATH:
C:\Program Files (x86)\Cycling '74\Max 6.1\Cycling '74\java\classes\
Jitter initialized
Jitter Java support installed
op.recognize based on CMU SphinX4 _Olivier Pasquet _2006 - rev 2009

metameta

I'm getting op.recognize-> error allocating when trying to load any of the example language files. I am on windows. Copied the right files to the classes and lib dirs. Anyone figure this out yet?

This looks good right?
MXJ System CLASSPATH:
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javalibjitter.jar
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javalibjode-1.1.2-pre-embedded.jar
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javalibjsapi.jar
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javalibmax.jar
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javalibsphinx4max.jar
MXJClassloader CLASSPATH:
C:Program Files (x86)Cycling '74Max 6.1Cycling '74javaclasses
Jitter initialized
Jitter Java support installed
op.recognize based on CMU SphinX4 _Olivier Pasquet _2006 - rev 2009

vichug

does look good, i'm on osx 10.6.8 and have comparable things, i succeed loading digits.gram, alas not hellongram.trigram.lm
errors reported :class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
andop.recognize-> error allocating Property exception component:'lexTreeLinguist' property:'acousticModel' - component 'hub44' is missing edu.cmu.sphinx.util.props.InternalConfigurationException: component 'hub44' is missing

metameta

is the source code available anywhere? I may end up following in Oliver's footsteps and wrap Sphynx for max, but if the work has already been done, it would be nice to just debug what's there.

vichug

you should get in touch directly with him then :) (through his website)

ironside

Hi Vichug. I get the same problem as you - did you ever find a solution?

vichug

Hey, sorry, it's been some time... but i'm not sure i ever succeeded tbh... i think i was hoping for Metameta to succed in recompiling the sources...

ironside

Thanks for letting me know. I'll try on a different computer and mention if I have any luck.

lalalali

I am trying to install op.recognize into max 6.1 but having issue when it comes to go to the lib directory in the terminal.
I put the files into the folders as indicated in the readme.txt.
What should i do afterward? When i type the directory (/Applications/Max 6.1/Cycling '74/java/lib) in the termnial i have a message saying " No such file or directory"...

Help please!!

Thanks in advance
Aline

psenough

Been trying to get op.recognize to work on Max 7 under Windows. No success so far, but i figured i'd share my steps here, and maybe someone has any additional insight?

I don't use Max/MSP regularly, apologies if something seems basic, it all feels equally daunting to me.

This is what i figured out so far:
- need same bits version of java and max installed
- need to place the library files under specific paths
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\lib\jsapi.jar
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\lib\sphinx4max.jar
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\op\recognize$1.class
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\op\recognize$2.class
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\op\recognize$3.class
C:\Users\username\Documents\Max 7\Packages\recognize\java-classes\op\recognize.class

If there is no "Packages\recognize\java-classes\op\" you get this error message:
Could not load class 'op.recognize'

If the jar's are not found when max launches you'll get these kind of errors when running the patches using op.recognize:
op.recognize-> file not loaded or not enabled or not ready yet

If both are present on the paths noted above you get a much nicer:
op.recognize-> error allocating Property exception component:'jsgfGrammar' property:'grammarLocation' - Bad URL C:\Users\username\Downloads\recognize\recognize\simple-examplesunknown protocol: c
edu.cmu.sphinx.util.props.InternalConfigurationException: Bad URL C:\Users\username\Downloads\recognize\recognize\simple-examplesunknown protocol: c

Googling this up i'm still a bit unsure what the problem is exactly, might be some mac/windows different path naming issue?! not sure if i can fix it on max or would require some patch on the sourcecode.

The original sourcecode isn't available so i decompiled the classes but i never did a max external before so i'm wondering how to actually set the project up (i guess there is a tutorial for this somewhere) and what the issue might be exactly (old max version objects used?! mac/windows path differences?!) so i'm just going to end up making random basic tests to try to figure it out.

Was hoping someone with more max externals knowledge would read this and give me some hint...

hoowdie

Hey, has anyone ever found a solution to get this running on Windows?
Thanks in advance!

yaniki

I have been looking at the issue of speech recognition lately.

In my opinion, it is worth paying attention to the speech recognition system built into Chrome browser. It works on many platforms (inter alia MacOS and Windows) and it's rather robust. It's very easy to write a language recognition mechanism (actually a simple HTML + JavaScript code) in your chosen language (Chrome speech-to-text engine is multilingual) running "inside" the browser, and redirect data (recognised strings) to Max via Node.js and sockets.

Of course from user's point of view it will be easier to bring speech recognition into Max via specialised external, but, if you are looking for an efficient and multilingual mechanism - I would suggest using the engine built into Chrome.

Snickers

It's very easy to write a language recognition mechanism (actually a simple HTML + JavaScript code) in your chosen language (Chrome speech-to-text engine is multilingual) running "inside" the browser, and redirect data (recognised strings) to Max via Node.js and sockets.

@yaniki Is there any chance you could create a demo patch for this process? Many thanks in advance.

yaniki

Ok, here is the example. It's very sketchy and there is a lot of room for further development/tuning, but the general idea is, as I think, explained.

The entire mechanism is based on 3 elements:

1) A "webpage" (a HTML document with JavaScript) using speech-to-text engine built-in into Google Chrome (theoretically any browser handling WebKit STT should work - however, if I'm not wrong, currently only Chrome supports this).

2) Simple Node script for messaging from Chrome to Max.

3) Simple Max patch showing how to receive and process data sent from Chrome.

The webpage is based on P5js (https://p5js.org) framework - but it should be relatively simple to edit the code a bit and to remove this dependency from the example if you want to. JavaScript code responsible for handling the speech-to-text processing is located in the "sketch.js" file (in the "SpeechToText" pseudoclass).

The mssaging system is build on top of the Socket.io library for the Node.js. I have not tested this code in the Node version built into Max, but it should work, too. However I recommend to start with standalone Node, because it is a proven solution. Remember to install Socket.io for Node.

Example Max patch is using the same HTML/JavaScript document, which is used for the speech-to-text processing. This document - if loaded inside Max - serves just as a "data router" receiving data from Chrome and sending them to the parent patcher. Check Max documentation for details about communication between content of the [jweb] and Max patcher. You may also check my simple solution for messaging between Max and P5js available here: https://www.paweljanicki.jp/projects_maxandp5js_en.html

How to use it:

1) Execute the "socketio_bridge.js" script with Node.js to establish communication mechanism (I assume, you're dealing with basics of the Node.js already).

2) Start "max_client.maxpat" patch with Max (be sure, Max Console is opened).

3) Load "index.html" file into google Chrome browser. Allow for microphone access. Say something to feed the speech-to-text converter - you should see detected words in the browser window, in your terminal (running and monitoring Node) and in Max patcher (and console). It will be also a good idea to open Chrome console to monitor eventual problems.

As I mentioned: it is a very simple and sketchy solution: you can switch to the Node build into Max, clear the code, and make it more error-proof.

Chrome speech-to-text mechanism is a little bit chimerical, but you can master it and get it to cooperate. Typical problems are: refusal to work if there is no access to the internet (so be sure, you are connected) and automatic deactivation (of the speech-to-text detector) after some time when nothing was detected (you can prevent it by adding automatic webpage restart on error [check "sketch.js" for tips] and enabling - for this particular document! - access to the microphone without a request from the browser - this last action will save the time spent on clicking to confirm access to the microphone).

Ufff... ok, have fun ;-)

maxchromestt.zip

zip 153.71 KB

chupilcon

Hello Yaniki, I would be very interested in trying your chrome speech to text tool, however when I load "socketio_bridge.js" into the js objec, Max module says "js: socketio_bridge.js: Javascript TypeError: http.createServer is not a function, line 5". I am new to js and I don't know if I am doing something wrong or if there is a typo in the code. Any help much appreciated. Thanks!

yaniki

@CHUPILCON "socketio_bridge.js" is a simple script that allows communication between MaxMSP and Chrome. It should be launched from Node.js (as I wrote in the instructions in the previous post), not from [js] object in Max.

Snickers

@YANIKI Amazing, thank you!
I only just found your response and it works like a charm! Thank you so much for your direction. This gives me so many areas to research but also a working model that I can start to mess around with. Thank you for your very clear instructions, they were very helpful. Can't thank you enough! Very much appreciated.

yaniki

Dear SNICKERS

Thank you for the kind words and feedback - I've already thought that nobody would be interested :-). I am glad that my solution works well (actually, information that this mechanism works well on computers that are used by other people is very valuable) - I should finally sort this code and switch to communication via Node.js built into Max, to simplify te project and make it more elegant.

vichug

@yaniki i think speech recognition (and speech synthesis) in Max are two lacking areas, and probably lot of users will be interested by this solution that seems easy enough to use ! though i'm concerned that this only works with an internet connection, does it use an online google service, or is it something that is really built into Chrome and just need online mode for authorization or something ?

yaniki

The "obligatory" connection to the Internet is puzzling me: theoretically, speech detection is built into browser, but does not really work if we're not online (a good starting point for conspiracy theories).

K_G

@Yaniki thank you for sharing your work! I tried it out and it works great. However, i cant figure out how to avoid the request from the browser. Chrome doesn't allow to change the permission settings, so i have to keep clicking allow. i am pretty new at this so i might misunderstood something.
Thanks again!

2-xite

Hello to the readers,

Here is a newbie question.
I'm trying to start the socketio_bridge.js by typing "node socketio_bridge.js" in the terminal of a Windows 10 machine (please take a look to the extract below). But it does not starting and I can not link index.html to max_client.maxpat.
Any suggestions?
Thank you greatly!

Alfredo


C:\Users\Alfredo\Desktop\maxchromestt>node socketio_bridge.js
Server started, listening on port 8081.
internal/modules/cjs/loader.js:638
    throw err;
    ^

Error: Cannot find module 'socket.io'
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15)
    at Function.Module._load (internal/modules/cjs/loader.js:562:25)
    at Module.require (internal/modules/cjs/loader.js:692:17)
    at require (internal/modules/cjs/helpers.js:25:18)
    at Object.<anonymous> (C:\Users\Alfredo\Desktop\maxchromestt\socketio_bridge.js:14:10)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)

yaniki

Hello 2-XITE

Unfortunately, I don't have access to a Windows computer right now, so I'll guess... ;-). Did you installed socket.io NPM package for Node? If no, type and execute:

npm install socket.io

in your terminal window.

2-xite

Good morning Yaniki,

No, I did not installed (did not know how) , but after your advice I was succeed to manage the tool to work!
Thank you for this important support, it works!

With compliments,

Alfred

Mario Fernando Cardoso

Hey! Great Work!!

I have a doubt... I can't make it work in Max MSP. I followed the instructions. But I have never use Node.js or javascript before, this is my first time. I think that I'm doing something wrong trying to execute the code. How do you execute "socketio_bridge.js" in node.js? Google Chrome its detecting words already but I can't make them appear in MaxMSP.
I tried to just write "socketio_bridge.js" and press enter, I also tried to copy-paste the code and press enter, but Im not sure this is the right way. How do I open "socketio_bridge.js" in node.js?
I know its a very noob question. I thought it was going to be easy following the steps, but I underestimated it... I guess its a good oportunity to start learning javascript. But if anyone of you can guide me on this one, I will really appreciate it very much.

Thanks!

yaniki

Hi Mario

First, you have to install Node.js from this page: https://nodejs.org/en/ (and probably to restart your computer)

Then you need to install socketio. To do this, open a terminal window and type:

npm install socket.io

Press enter, wait till the library will be installed.

Once you installed Node and socketio you can use the bridge. To execute socketio_bridge.js: navigate to the folder with the project, in your terminal window type

node

(add one "space" character) after the "node" (so, it should be "node "), drag socketio_bridge.js into terminal window, and press enter.

Alternatively you can use Node for Max (instead of standalone version, but I didn't tested this).

Roland Cahen

Hello,

Thank-you very much Pawel
I've worked it out. Speech to text works pretty well in various languages (set in the sketch.js script) and the transmission to max is strait.
I have a few questions related more to Chrome Speech Api :
Is there a way to set the time out to truncate separate answers ?
How could I lock the mike on in GoogleChrome ? At the moment, GoogleChrome always stop the access to the microphone after a minute or two and asks to allow access manually.
Is there a way to use Firefox instead of Chrome with the same tool ?

Best regards
Roland

yaniki

Hi Roland, thanks for kind words.

Unfortunately, Chrome's Speech API is rather unique and has no equivalent in Firefox.

I don't know why Chrome locks up the microphone after a minute - I have not encountered such behavior before, maybe it's some mechanism in the privacy settings? All I can say now is that the current version of Chrome is running continuously on my computer macOS - in fact, there is a fragment in the script that automatically restarts the speech recognition system after the browser automatically exits it, and it has worked fine so far.

Hank Scorpio

Hi I am reviving this old thread in the hope that someone will be able to help.

I am trying to run Yaniki's very helpful code but I get the following error when i try to run socketio_bridge.io

TypeError: require(...).listen is not a function
at Object.<anonymous> (D:\Documents\MEGA\Program-Files\My-Website-Files\p5-speech-recognition\socketio_bridge.js:16:31)
at Module._compile (node:internal/modules/cjs/loader:1103:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1155:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47

Does anyone have any ideas about how to resolve this?

yaniki

Hello friends!

I will try to come back to this mechanism in my free time. It was never a fully developed solution - rather a sketch and I did not expect it to be still in use after so many years. Probably the cause of the problem is the tightening of browser security systems - it makes sense, but also makes it difficult for people like us. I'll try to do something about it anyway.

Hank Scorpio

Thanks Yaniki,

Let me know if you come up with everything. I had a mess around with your code and by changing the line

var io = require('socket.io').listen(server);

to:

var io = require('socket.io')(server)

I got rid of the error, but I still get an issue on chrome where it says net::ERR_CONNECTION_REFUSED and no data is passed to node or max from the browser.

Jihyeon Kim

@YANIKI, first, thank you for your mention.
It's been a long time since you upload the file, but I want to ask you something
I want to extract text data from the "webkitSpeechRecognition" results,
but I can't understand how I do
can I get some idea or ways??
Thank you for reading.

yaniki

Friends

I plan to refresh this mechanism, but am still burdened with other projects. Maybe I can get back to that in the next few weeks.

Thomas

very nice ! i'm looking for TTS from a long time, and your work open a door
hope you will refresh it !
- do you think a way to work in local not with internet ?
- a way to lock the microphone in chrome ..
Best regards
Thomas

Thomas

Hi @Yaniki ! did you find a trick to work offline and to lock microphone since last year ? :D
best regards
Thomas

Ostin Solo

I have developed a system that lock the microphone but not to use it offline yet. Please contact me on my IG under ostin.solo we can finilise the system together.

cheers

yaniki

Dear Friends

I've updated the mechanism I posted some time ago: now the speech detection results in an external browser are sent to MaxMSP using Node4Max, so it's easier to handle it all.

https://www.paweljanicki.jp/projects_maxandp5js_en.html

2-xite

Dear YANIKI,

Thank you very much for your valuable work. I have been using your tool for some time now and am very with it.

Do you perhaps already know if I can solve the well-known problem (I am on Win10) with Chrome so that the browser does not pop up the privacy and microphone message after a few seconds?

It is, of course, not feasible to press ‘allow’ every time. I have created an interim patch to automate this confirmation (with an auto click + timer), but in practice it does not work very well... I do not know how to disable the microphone notification....

I look forward to hearing from you or perhaps from other Max-users.

Thank you and best regards,

2-Xite

yaniki

Hello 2-XITE

Thank you for your kind words!

I'm not sure whether the answer to your question is within my modest competence. Maybe someone else will be able to do it.

But what comes to my mind right now is the following procedure:

1) Open example in Chrome (something like http://localhost:5001 should appear in the address bar).

2) Confirm access to the microphone when browser will ask.

3) Go to Chrome's Settings -> Privacy and security -> Site settings

4) Set ("allow") access to the microphone for localhost.

It works on my machine (Chrome doesn't ask again when reloading site with the same address pattern) - however I tested this tip on Mac only (have no PC around) - this should be an cross-system solution, but, of course, real life is always providing us some unexpected phenomena.

Anyway: this problem with privacy settings is something I would like to solve in the future, if possible, and will keep an eye on it. If I find a solution I will definitely share it.

2-xite

Thank you, YANIKI for the detailed advice.

I thinks I'll better find a secondhand Mac mini instead of Windows machine....

I checked all browser settings and they were just open (or allowed access to the microphone). So I think it's a kind of bug in the OS after all...

In any case, your mechanism works well and that's what matters most. For now, I have to use an auto-click and timer (exact on 9 seconds) for auto-confirmation.

With compliments and thanks.

2-Xite

yaniki

I made a small correction - now in continuous mode the script should automatically resume speech recognition if the browser turns it off by itself (I noticed this behavior in Chrome). The corrected version is now uploaded and available at the same link.

2-xite

Dear YANIKI,

Thank you for this great update, I've managed how to use your new version of MaxMSP with Node4Max. And indeed, the script is no longer interrupted by the asking the browser here on W10 to may use the microphone, which makes TTS great to use.

As far as I'm concerned, it's worth at least a large cup of coffee (see my email).

Just another technical question: do you happen to know how I can reduce the response time of the TSS-engine without setting the ‘interim’ option to ‘false’, which makes it so slow?

Its so, that interim has a correct but slow response. Is there something in between? I would appreciate hearing from you.

With compliments and best regards,

2-Xite

yaniki

Hello 2-XITE

I'm glad the newer version works better!

Unfortunately, I don't have a solution to the issue you're raising — some browsers may respond faster than others, but I haven't personally noticed any significant differences. Indeed, in "non-interim" mode, speech recognition is more efficient, but slower - this makes sense because in the context of a longer phrase, the speech detection engine can correct the result by embedding it in a broader context (at the cost of a longer delay). The Web Speech API doesn't provide any parameters here, but if I think of something, I'll definitely share it.

2-xite

Hi YANIKI,

Thank you for the comment, of course, and even if it not possible it is still a great tool!

I will keep an eye on all updates to this topic.

Year

2007-2009

Location

Tokyo, Japan

Links

http://www.opasquet.fr/dl/recognize.zip

Author

Olivier Pasquet