Re: Extremely precise sonogram ?


Forums > MaxMSP > Extremely precise sonogram ?
August 10, 2010 | 3:41 pm

Well, thanks again for your comments and tips.
From your points :

> 1 -
you're right Izotope RX is in fact using this "zero padding" method as written in the hint as i just read it. Attached below is another example of the power of this misnamed "frequency overlap".

> 2 -
you're right that averaging spectra from VERY different FFT sizes will *reduce* accuracy. It is clear that extreme window sizes (like 256 or 8192) will dramatically reduce accuracy (in y for 256 or in x for 8192). In fact i was looking more at "shafqat.aif" in RX and, through it may depend on the sound analyzed, the more i look at it, the more i feel that a mixed sonogram from different fourier window sizes should stay around 1024. Then perhaps DFT (Discrete Fourier transform) instead of FFT, should be used to mix the results of none-power-of-2-windows-sizes between 800 and 1600 samples.
(But, wow, in wikipedia, they say that DFT is 100 slower than FFT…)

> 3 -
Through i realize, while googling "parabolic peak interpolation", that sometimes the math start to go over my head in this discussion, i want to notice that this idea, about using one method for pitched content, and anther one for noise content, does not convince me at all. My goal is nothing about aesthetic sonogram images. (except the ufo joke above) If you use different methods for pitched and noise content, you assume that pitched and noise content are like black and white in your sound, without any greyscale. But it's never like this. (or only for ugly electronic sounds) At the very beginning of a soft violin note, you have only noise, then the pitches from the harmonics starts to appears gradually from the noise, until they get really clear when the bow is pressed harder on the string.
You will never be able to say "this is pitched content, and that is noise content". Again, all that you can have are probabilities. Ok, somewhere, a fine line of 97% probabilities, surrounded by an area of 1% of probability, can be called "pitched content", and a clean big area of 10% of probabilities can be called "noise", but between these 2 extremities, listening to the sample of violin synthesis* attached below, it is clear that it's not always black or white.

> 4 – http://en.wikipedia.org/wiki/Multitaper
Reading this, i also feel a bit lost with the math…

A bit out of the subject, i wanted to say again that sigmund~ is a really nice object. Perhaps, the polyphonic pitch tracking idea that i tried to describe above ("harmonically multiply" everything together) could be better achieved using sigmund~.

* theses "blurred" harmonics are made using that: http://cycling74.com/forums/topic.php?id=20980

[attachment=138610,951]