Sinewave speech, formant tracking, filtercoeff~ in reverse, etc?

catniptwinz's icon

So, this is a bit of an open-ended inquiry, as I can already see a few different potential ways to approach the problem... Basically, I'm looking for a way to experiment with the "sinewave speech" effect described at...

...which entails tracking formants (their approach is LPC-based, I believe) and producing a coarse approximation of speech from a handful of sine waves. Ideally, I'd like to end up with an abstraction that can be fed recorded speech and will generate (not necessarily in real time) a list of frequencies and amplitudes for each "track." I understand that a tool like Praat (http://www.fon.hum.uva.nl/praat/) will give me either rendered audio or filter coefficients, but probably not the frequencies themselves? Can anyone speak to this?

Having searched the forums, I'm aware of the externals that exist for doing LPC-related stuff (Mark Cartwright's LPC Toolkit, Gabor/FTM), but (LPC being what it is), they seem geared toward the generation of filter coefficients, which leads me to wonder whether there's an object that's like an inverse filtercoeff~ - in other words, how might one derive a set of higher-level filter specifications (frequency, amplitude, q) from a set of lower ones (coefficients)? Anyone working along similar lines? If you are, please excuse the remedial nature of the question; I'm no dsp head.

Finally, I'm aware that LPC analysis and formant tracking are not equivalent tasks. My real aim here is formant tracking, but ultimately, I'm going more for "awesomeness" than "correctness" and am more than willing to audition results from either column.

Thoughts?

oli larkin's icon

I've had good results using miller puckette's sigmund~, which has a tracks output. In the help patch it has an example and you can reduce the number of partials down to just a few and still understand what a resynthesised voice is saying.

commathe's icon

Yeah, second what Oli is saying there. Seems like sigmund~ does nearly exactly what you want. There is a download link over here:
http://crca.ucsd.edu/~tapel/software.html

catniptwinz's icon

Of course! Yeah, I should be able to make this work with sigmund~. Thanks very much for the tip!

catniptwinz's icon

Sigmund~ does indeed get me in the neighborhood of the intended results. Do I assume correctly that the high frequency "flutter" I'm getting across all tracks is the result of the algorithm being thrown off by the unpitched component of the speech? It's turning out to be more difficult than expected to reduce these artifacts by tweaking the parameters of sigmund~.

Of course, this is a pretty remedial question, so feel free to point me to relevant reading on the topic. Just trying to get a better understanding of what I'm working with.

chachaching's icon

CATNIPWINZ,

I am trying to do a similar thing with sigmund~. I want to track formants coming in and like you I am looking more for awesomeness over correctness.

Would be so helpful if you could share your findings.

Peter McCulloch's icon

Check out zsa in the package manager. Cepstral analysis can be used to track formants, among other things.

chachaching's icon

Peter Mcculloch,

Show me your zsa. ways please

chachaching's icon

zsa.mfcc is what I want, just how 2 use it correctly is the question