Formant analysis

Mrboni's icon

Hey. Do there exist any ways to analyse formats of human speech in Max?

Cheers

Matthias's icon

Hi there.

I think there's a way. I myself, I'm just now trying to dig into spectral stuff within max.

Check out Jean Francois Charle's videos on youtube. That's one:
http://www.youtube.com/watch?v=VsjdzjOrrn4

And go to cycling'74's share page that hosts a link to his article on spectral synthesis in max. You can also download example patches.
https://cycling74.com/share.html

I haven't started yet. But maybe that could be a starting point for you.

Cheers,
Matt

Mrboni's icon

Thanks for the comment. That spectral resynthesis was nice.

What I want to do though is identify the formants in human speech, in realtime, so I can tell at any given moment the vowel sound being produced.

I imagine that it should be possible given that it's fairly straightforward to generate those formant sounds artificially....?

mudang's icon

The main difficulty is to "seperate" the peaks of the pitch harmonics from the formant peaks. Two commonly used techniques are cepstrum and lpc.

I remember an external which would directly output the vowels but i can't recall its name... :(

Mrboni's icon

Thanks

I'll take a look at those techniques.

Anyone remember the name of that external?! I want.. :)

Jean-Francois Charles's icon

There used to be a lpc~ object in the Ircam forum distribution. On maxobjects.com, it shows up as only Mac OS 9.

Roald Baudoux's icon

There is a gbr.lpc object in the FTM/gabor stuff.

There used to be a tap.lpc~ object in Tap Tools (around 2002) also. It could import LPC analysis files from csound.

Mrboni's icon

Can lpc directly analyse the formants?

mudang's icon

LPC analysis gives you filter coefficients for an IIR Filter, that will "model" the frequency response of the vocal tract.
So you will still need a method to identify the formants in that spectrum (and a method to calculate the spectrum from the filter coefficients, which puzzles me). But this should work a lot better than looking at the original speech spectrum.

I've never tried it myself but from what I've read, formant tracking is no easy task. There are also approaches using neural networks and dynamic programming...

BTW, there's a new collection of realtime lpc externals by mark cartwright:
http://www.markcartwright.com/projects/lpcToolkit/

Mrboni's icon

Thanks mudang. Could this be used to estimate the closest formant match based on simulated examples of each formant? - http://ftm.ircam.fr/index.php/Gesture_Follower

Drkovorkian's icon

Hey Mrboni. How far did you get with this in the end? I am doing something extreemly similar, and after a little research seem to have reached the same conclusion as you.

Sound in > LPC > Gesture_Follower

Wondering if how this was working out for you.

Mrboni's icon

Hi Drkovorkian,

I ended up running out of time for the event this was for so didn't get very far.

I'd be interested in sharing ideas and progess on this. What do you plan to use it for?

Have you played with either the LPC externals or the Gesture follower yet?

Will

chachaching's icon

Hello,

I am working on a project now that needs to connect microphone> LPC data > GF. Has anyone had experience with this?

Thanks

chachaching's icon

Anyone work on a project like this?

Matthew Gantt's icon

chiming in with nothing helpful but enthusiasm.

I've been listening to a lot of the Paul Di Marinis album Music as a Second Language, which is pretty LPC heavy, and curious wahts out there nowadays, toolset wise-

http://www.lovely.com/titles/cd3011.html