looking for examples/leads: resynthesizing a sound based on analysis of an incoming signal.

Hector MacInnes

Hi everyone,

I'm at the very start of what is going to be a long project, and looking for some examples of work or general advice on where to start.

The basic task is this: I want to take an incoming signal (my speaking voice) and have Max reconstruct a totally artificial signal based on it, using some other source, in real time. So let's say I have an audio file of someone else speaking, or a range of animal sounds. I might then try and use granular synthesis to draw from that corpus, following the pitch and timbre of my speaking voice, and in some way allowing me to speak in their voice.

I should note here, I absolutely do not have the goal of creating any kind of passable deep fake or anything here. I am expecting, and hoping for, glitchy and weird results, and am much more interested in using my voice to articulate audio drawn from field recordings or animal sounds, but the above example seems clearer.

There are cool projects out there which take an incoming signal and reconstruct it using a "best match" drawing from a corpus of grains, such as rodrigo constanzo's CCCombine... but this works on the basis of provided corpuses with their associated metadata already created. I'm also wondering if there are more recent ways of approaching this.

At the moment I am working through trying to learn the various Flucoma objects but am basically wondering if anyone has any thoughts, or other examples of similar things, or even bits of patch they'd be willing to share.

Many thanks!
Hector.

TFL

In addition to FluCoMa, this is something you can achieve with Ircam's MuBu package, maybe with the help of the additionnal CataRT-Mubu package, as demonstrated in this video.

I highly recommend going through all the Ircam MuBu tutorial videos to get starting with MuBu properly.

Hector MacInnes

Ah great thanks - yes I'll look into MuBu as well!

I think the step I've been struggling with in terms of following tutorials and vids, is creating a corpus that isn't just sliced up by onset detection, but breaks the source into grains that I can match to in real time, using mfccs or another desciptor. Anyway, working my way through some more stuff, so thanks for the additional lead :)