Sinewave speech, formant tracking, filtercoeff~ in reverse, etc?
So, this is a bit of an open-ended inquiry, as I can already see a few different potential ways to approach the problem… Basically, I’m looking for a way to experiment with the "sinewave speech" effect described at…
…which entails tracking formants (their approach is LPC-based, I believe) and producing a coarse approximation of speech from a handful of sine waves. Ideally, I’d like to end up with an abstraction that can be fed recorded speech and will generate (not necessarily in real time) a list of frequencies and amplitudes for each "track." I understand that a tool like Praat (http://www.fon.hum.uva.nl/praat/) will give me either rendered audio or filter coefficients, but probably not the frequencies themselves? Can anyone speak to this?
Having searched the forums, I’m aware of the externals that exist for doing LPC-related stuff (Mark Cartwright’s LPC Toolkit, Gabor/FTM), but (LPC being what it is), they seem geared toward the generation of filter coefficients, which leads me to wonder whether there’s an object that’s like an inverse filtercoeff~ – in other words, how might one derive a set of higher-level filter specifications (frequency, amplitude, q) from a set of lower ones (coefficients)? Anyone working along similar lines? If you are, please excuse the remedial nature of the question; I’m no dsp head.
Finally, I’m aware that LPC analysis and formant tracking are not equivalent tasks. My real aim here is formant tracking, but ultimately, I’m going more for "awesomeness" than "correctness" and am more than willing to audition results from either column.
I’ve had good results using miller puckette’s sigmund~, which has a tracks output. In the help patch it has an example and you can reduce the number of partials down to just a few and still understand what a resynthesised voice is saying.
Of course! Yeah, I should be able to make this work with sigmund~. Thanks very much for the tip!
Sigmund~ does indeed get me in the neighborhood of the intended results. Do I assume correctly that the high frequency "flutter" I’m getting across all tracks is the result of the algorithm being thrown off by the unpitched component of the speech? It’s turning out to be more difficult than expected to reduce these artifacts by tweaking the parameters of sigmund~.
Of course, this is a pretty remedial question, so feel free to point me to relevant reading on the topic. Just trying to get a better understanding of what I’m working with.
Forums > MaxMSP