text to speech...

Emerson's icon

Hello, could anyone recommend me a practical means for text to speech with Max 7? I need a solution for both mac and windows. Thanks.

metamax's icon

Prepend text you want spoken aloud with 'say'. Works great.

foldh's icon

Anything like this for Windows?

metamax's icon

Anything like this for Windows?

I think so. Peter's Text to Speech (ptts) is free and it has similar options for tweaking sample rate, volume, voice, reading text files, generating wav files, etc.

foldh's icon

Thanks for the suggestion Metamax, but I am more looking for an actual Max object. Something along the line's of Bill Orcutt's shell.

Emerson's icon

Seems hard to solve for Windows. I also wonder if there is an option such as a vst plugin.

metamax's icon

but I am more looking for an actual Max object. Something along the line’s of Bill Orcutt’s shell.

Huh? That's what I just suggested. [shell] doesn't translate text to speech for Mac either, it just communicates with the OS via shell script. Same with Windows. The only difference with Windows is you install an open source program to receive text to speech messages from [shell].

metamax's icon

Seems hard to solve for Windows.

What is hard to solve? Download jampal-windows setup and run it. Installing the [shell] object is more complicated because you have to manually place it in a folder. Once you're done, you can use [shell] to create instant voice to speech anywhere within Max.

Either way, thank you for posting this thread. I wasn't aware of this feature.

vichug's icon

@emerson : in such case, your best bet is indeed to use a system in your patch that recognizes the os on which the patch is running, so it will know if it is osx or windows ; and send the output to the shell object or the equivalent in windows.

text to speech is a complicated and advanced synthesis technique, which involves in fact a lot of components (grammar detection, text translation to a readable format, then synthesis, which is not a trivial kind of synthesis). Those objects are communicating with os-provided tools ; which are very powerful and embedded in operating systems (be it Windows or Max). The prevalence of blackbox (closed, all-in-one, but nowhere open source, or customizable) text-to-speech synthesis solutions in a lot of modern devices may lead you to believe it's easy to do, but it is not the case if you don't use said blackboxes. The closest thing to a "only" max solution would be mage~ https://github.com/numediart/mage or mbrola~ http://tcts.fpms.ac.be/synthesis/mbrola.html but they are not easy options (in particular, Mage is very powerful but doesn't offer an easy option for text-to-computable format ; and it is not compiled on Windows). So the solutions suggested by Metamax are actually the easiest, more convenient and more cross-platform you could find atm.

vichug's icon
Max Patch
Copy patch and select New From Clipboard in Max.

to get system information, use the message to the max application "getsystem "
you can have more info with the [gestalt] object, but i'm not sure how to use "gestalt". getsystem will just output windows or macintosh, but it's probably enough for your purpose.

Emerson's icon

@metamax: Thank you for your recommendations. I meant “seems hard to solve” using/controlling “Windows ways” from within Max. (An “mxj DosHack” is mentioned in other threads, but the quality of Windows tts feature didn’t suit, so I gave up on that attempt.)

@vichug: Thank you for your replies and for the OS choosing patch. My question wasn’t definite enough, my bad. Atm I wouldn’t be reluctant to prepare separate patches for different OS’s if I could find satisfactory sounding tts solutions, but so far I couldn’t.

Some elaborate solutions out there need supplying details for every unique input and it might not be efficient as I consider the amount necessary for the patch I am programming.

I think I may have to be doing lots of recordings with real persons; at least for a definitely “better sounding” solution.

midinerd's icon

Hey there -

I found a Windows way but it is hacky.

Using bernstein [shell] as linked above.
https://cycling74.com/tools/bernstein-shell/

The 3rd-party binary for text-to-speech was "voice.exe" from:
https://www.elifulkerson.com/projects/commandline-text-to-speech.php

The (message) I sent into [shell] was formatted as follows:
powershell -c (Start-Process 'C:/users/my.user.folder/Downloads/voice.exe' 'This will open a separate shell window and play back the sound.')

Obviously the path to your 'voice.exe' will vary.
- C:\some\download\location\voice.exe
becomes
- C:/some/download/location/voice.exe
within the message to [shell]

It opens up a separate shell window, calls the command, and closes.
There's a start-up/tear-down time for the powershell prompt, stuff like that.

So it certainly leaves things to be desired - but knowing that you can call 'powershell' to jump out of the confines of [shell] itself must be nice. Right? Yep.

sonic shrubs's icon

@MIDINERD: thank for sharing this! it works also with espeak (http://espeak.sourceforge.net/).
Isn't there a way to open the the corresponding program (voice or espeak) in cmd.exe once and than only sending the text to avoid the constant opening and closing of the program?

Are there any other current solutions for tts for Max in Windows?

midinerd's icon

@Sonic Shrubs - Not really.

I haven't ever looked into this before. A
user on the facebook forum posted this question - unsurprisingly the Windows branch of functionality was the least solved.

I found that 'solution' with powershell, a huge hack, that barely satisfies the requirements and hardly satisfies the long-term vision of manipulating TTS from max/msp on windows.

For example you can't capture the audio or manipulate it in max so it will always be dry - and issuing it through PowerShell is just a proof-of-concept that mostly proves that it is possible but not super fast or useful. If it meant the difference between realizing a project and not - I would deploy it to MacOS for production. If I could work with one of the authors to port it to a Windows object, that'd be the best bet - I personally can't port it.

Martin Daigle's icon

I have looked into TTS for windows lately, and found two good options!

TTSCompare.maxpat
Max Patch

the first version I tried is this one by Brian Ellis

https://www.youtube.com/watch?v=eztLFYkJqz8&ab_channel=BrianEllisSound

I modified the JS code to allow a dynamic change with the voices. It works, but it is not exactly real time, perhaps because it has to communicate with the voicerss server.

const http = require('http'); // or 'https' for https:// URLs
const fs = require('fs');
const maxApi = require('max-api');
const { StringDecoder } = require('string_decoder');

var myKey = "You_need_to_put_in_your_key";
var voice = "Voice_Change";

maxApi.addHandler('setKey', (key) => {
    myKey = key;
});

maxApi.addHandler('setVoice', (vce) => {
    voice = vce;
});

// Function to handle the 'say' command
function handleSayCommand(words) {
    const urlWords = encodeURI(words);
    const fileWords = words.replace(/ /g,"_").replace(/\./g,"").replace(/\!/g,"").replace(/\?/g,"");
    fileName = "tts_"+fileWords+".wav"
    if (fs.existsSync(fileName)) {
        // The file already exists, return
        maxApi.outlet("exists "+fileName);
        return;
    }
    
    const file = fs.createWriteStream(fileName);
    const request = http.get("http://api.voicerss.org/?key="+myKey+"&hl="+voice+"&src="+words, function(response) {
        response.pipe(file);    
        var errorMessage = "";
        var isError = false;
        response.on('data', function (chunk) {
            const decoder = new StringDecoder('utf8');
            isError = decoder.write(chunk).startsWith("ERROR");
            errorMessage = decoder.write(chunk);
        });
        response.on('end', function () {
            if(isError){
                maxApi.outlet("error "+errorMessage);
            } else {
                maxApi.outlet("success "+fileName);
            }
            console.log("isError: "+isError);
        });
    });
}

// Register handler for 'say' command
maxApi.addHandler('say', handleSayCommand);

A version with espeak, works great!

https://www.youtube.com/watch?v=ZoCGjAFhZRY&ab_channel=MarkSantolucito