Text-to-speech options?

Brown's icon

Hey all,

I'm working on a piece that will require text-to-speech functionality, that will be performed by someone other than myself 90% of the time. I'm trying to find a cross-platform solution that doesn't involve the performer installing a ton of extra garbage to make it work. Here is a list of my dead-ends; can anyone come up with other options?

1. Mac OSX say command through shell external plus Soundflower, or using aka.speech. Works fine, but isn't cross-platform.

2. mbrola~ object. This does work, though I'm trying to use generated text, and I have yet to find a library that just does text-to-phoneme conversion to accompany it. I looked at eSpeak (it can't find its files), a separate eSpeak for Mac installer (crashes frequently), lmtool from CMU (Sphinx doesn't seem to include pitch/duration information, which mbrola~ needs), and a couple others that are either defunct or install improperly.

3. HTML5 Web Speech API with SoundFlower. Make a dummy page with jweb, use the executejavascript message to send the following to the page:

var msg = new SpeechSynthesisUtterance('Hello World');
window.speechSynthesis.speak(msg);

Doesn't work. It seems like the Chromium Embedded Framework currently doesn't support the Web Speech API, though Chrome does. I was excited about this possibly working, because in Chrome, the voice used sounds exactly like the voice used for turn-by-turn directions in the Google Maps app.

4. Use the GET-only version of the Google Translate API, and send less than 100 characters at a time. Not ideal, because you first have to CAPTCHA to authenticate, and afterwards it delivers an mp3 in-browser that you have to click on to play. I could write some javascript to reload the page with a new Translate API request each time, and automate the button press (maybe), but it's iffy at best. Like the Web Speech API, its timing won't be perfect because it's asynchronous, and it's legally shady because the performer might make too many API requests and get blocked.

I'll go back to aka.speech and hope the performer uses a Mac if all else fails, but it would be super cool to find a better solution. Any ideas are appreciated!

vichug's icon

there was a discussion bout this here : https://cycling74.com/forums/text-to-speech-2/
i think the best offline working solution (yet) is shell for mac and the shelle equivalent for windows, and identify the platform from the max patch
i would be very glad to be wrong tho