speech synthesis using fft, or some other spectral analysis

aceslowman's icon

I am looking to synthesize vowel tones, and without necessarily wanting to go the granular route, I have been looking into perhaps using FFT? I have only dealt with fft at the level of fixed sine waves and FM synthesis, which it worked well then, but how exactly would one go about taking in a sound file, and finding the frequency content of a segment of it? Going through the fft tutorial, it expects a cosine, and I can't wrap my mind around using a recorded sound. Still learning these concepts, but I feel like taking the frequency content and using filters to approximate it, should give me, at least in some respect, a tone that resembles my source audio.

Any resources or advice would be hugely appreciated.

aceslowman's icon
Max Patch
Copy patch and select New From Clipboard in Max.

Here is what I have at this point, it seems to be working, but I have no good way of telling the accuracy of the output...

metamax's icon

I have no idea if it's useful for analyzing formants, but ~fiddle was just updated to 64-bit. ~sigmund might also be relevant. Both can be acquired here.. http://vboehm.net/downloads/

Roman Thilenius's icon

what fft can do fine is to differentiate between more tonal and more atonal content.

but that doesnt make you a vocal synth. and i would try to avoid fft at all for a sound generator because of the latency it produces.

the process of analysis should happen offline. you can use SDIF stuff or just a simple filterbank and then use the found coefficients to mimic the formants using some kind of resonators or even biquad filters.

Mark Durham's icon

Here is a formant filtering patch if you go that route - adapted from one of Andy Farnell's PD patches if I remember correctly.

Should put a smile on your face if nothing else.....

Formant-Synth-1.0.maxpat
Max Patch
aceslowman's icon

Roman Thilenius, it was really my intent with the fft~ to somehow find the filter coefficients, then log them into a separate dict. Building a seperate tone "designer" first, where the data is stored, and then I'll focus on latency and efficiency in another patch. I've never dealt with SDIF

And Mark Durham, definitely put a smile on my face, and definitely has a lot to offer, so I'll be dissecting this as well, really close to what I need. Do you know how the formant values that are passed in were calculated?

Roman Thilenius's icon

mark, you should really add a control for the pulse with of the exciter, it is essential.

Roman Thilenius's icon

& your vibrato seem to only scale upwards ;)

aceslowman's icon
Max Patch
Copy patch and select New From Clipboard in Max.

How exactly is that exciter working? Referring specifically to the *~30 and /~1 objects connected to the cycle. It also seems like fffb~ really does simplify things, hopefully there isn't much of a tradeoff between that and just multiple reson~ objects.