Chatbot with Elevenlabs responding to itself.

Sean Sanchez's icon

Sean Sanchez

10月 08 2023 | 4:04 午後

Hi community. I've been working on making a chatbot in Max using two node.js objects. One for the Deepgram API and the other for the OpenAI and Elevenlabs API's. I use a Deepgram.js object to convert speech to text which sends as a message to the elevenlabs-openai.js object which processes the prompt and plays the audio response from max. The problem is that the Deepgram.js object listens to the audio and transcribes it as the next prompt causing a feedback loop. I've tried an option with adding a toggle to turn on and off the recorder library in the deepgram.js object but I'm novice to javascript and couldn't get it to work properly. Any help or suggestions would be appreciated.

TFL's icon

TFL

10月 08 2023 | 7:09 午後

What you want to achieve is not very clear.
If I get it right, I guess you first say a phrase in a mic, the audio is transcribed by deepgram, which sends the text to elevenlabs_openai, which synthetises audio from it and play it through the speakers, which audio is then captured by the mic and sent to deepgram, and so on. Is this it?
If so, I guess you indeed want to control when deepgram receives audio from the mic instead of always listening. But I can't tell how to do that without seing your patc and js files. Is the audio captured from within Max directly? Or is it handled by node?

Also, maybe it's just a design issue: in the context of a chatbot, why would you say something, then hear the computer repeating what you just said but in a synthetized manner?

By the way, I'm quite curious to see how some messages degrades/evolve through many iterations of that feedback loop.

Sean Sanchez's icon

Sean Sanchez

10月 09 2023 | 9:49 午前

Thanks for your response. The goal is to create a conversational chatbot through Max with little to no manual input. (ie: entering text, on/off controls with the exception of starting the node.js scripts). The audio is captured/handled by node in the maxDeepgram.js object using the recorder library. I'm leaning towards a toggle to pause() or resume() the recorder. I've added the patch and node.js files but you'll need your own keys to try it out. Appreciate any suggestions :)

maxChatBotFinal.maxpat
Max Patch

maxDeepgram.js
text/javascript 1.92 KB

elevenlabsChatGPT.js
text/javascript 1.75 KB

TFL's icon

TFL

10月 09 2023 | 2:48 午後

According to the recording library documentation, you can pause your recording by calling "recording.pause()" and resume with ""recording.resume()".
You can simply try to declare functions like such: add the following code

maxAPI.addHandlers({
    pause: () => recording.pause(),
    resume: () => recording.resume()
});

in maxDeepgram.js, then, before send your "message" message to elevenslabChatGPT, send the message "pause" to maxDeepgram.js to call that function and actually pause the recording. Then, once the text is played through the speaker, send "resume" to maxDeepgram.js.

I did not tested this, but it should work straight away.