Alexandre

(posted on the jitter forum, but it's for MaxMSP forum)

I want an extremely precise sonogram. not just 2 or 3 times more precise than the actual one. i want a million pixels sonogram, with both time and frequency precision, i want to see the perfect fine shapes of each harmonics. not blurry paté...

I don't care if it take 95% of cpu, or even if it needs 2 minutes to compute a 2 seconds visualization.

how to reach this ? jitter ? 32-time upsampling>fft ? what about wavelet transform* ?

Note: I tryed some sonogram software on mac, they weren't that much prettier than [sonogram] in max...

http://en.wikipedia.org/wiki/Short-time_Fourier_transform#Resolution_issues

how to reach this ? jitter ? 32-time upsampling>fft ? what about wavelet transform* ?
any software that already do this ?

* http://en.wikipedia.org/wiki/Short-time_Fourier_transform#Resolution_issues


extremely-precise-sonogram-2

A million pixels sonogram. Let's see. Suppose we work with a sampling rate of 44100Hz. 1024 frequency bins over a range of 22050 Hz, that means a FFT size of 2048 samples, i.e. 46ms.

If you want 1048576 frequency bins over 22050 Hz, that means a FFT size of 2097152 samples, i.e. 47 seconds. Definitely possible, but not straightforward in Max.

Oh, but you want both a great frequency AND a great time resolution. À la fois le beurre et l'argent du beurre. Well, this is simply impossible. It's the audio / wave equivalent of Heisenberg's uncertainty principle.

But by choosing a nice analysis window size, you can get really nice sonograms, within Max, or with free software like Raven Lite and others. If you keep a window size of 256 for instance, you might not get what you want.

A million pixels sonogram. Let's see. Suppose we work with a sampling rate of 44100Hz. 1024 frequency bins over a range of 22050 Hz, that means a FFT size of 2048 samples, i.e. 46ms.
If you want 1048576 frequency bins over 22050 Hz, that means a FFT size of 2097152 samples, i.e. 47 seconds. Definitely possible, but not straightforward in Max.

Oh, but you want both a great frequency AND a great time resolution. À la fois le beurre et l'argent du beurre. Well, this is simply impossible. It's the audio / wave equivalent of Heisenberg's uncertainty principle. 

But by choosing a nice analysis window size, you can get really nice sonograms, within Max, or with free software like Raven Lite and others. If you keep a window size of 256 for instance, you might not get what you want.


didnt soundhack have like 265,000 frames ? but i dont see how more frequency

bands then 2 times your monitor height should be useful ..

didnt soundhack have like 265,000 frames ? but i dont see how more frequency
bands then 2 times your monitor height should be useful ..


on a related note, I would absolutely love to know what all this is. Their sonograms look amazing.

http://www.izotope.com/support/center/index.php?x=&mod_id=2&id=388

on a related note, I would absolutely love to know what all this is.  Their sonograms look amazing.

http://www.izotope.com/support/center/index.php?x=&mod_id=2&id=388


izotope tools are wonderful. But even they didn't break the uncertainty principle. For instance, from the page linked by AudioMatt:

"Auto-Adjustable STFT...if you zoom in horizontally (time) you'll see that percussive sounds and transients will be more clearly defined. When you zoom in vertically (frequency), you'll see individual musical notes and frequency events will appear more clearly defined."

Yes, that's exactly the point, you can get either a good time or a good frequency resolution. You could work on replicating their idea (linking to zoom level) in Max; you could even work on making a "multi-resolution" analysis (from their page: "spectrogram with better frequency resolution at low frequencies and better time resolution at high frequencies"): you could calculate the spectrums with two different FFT sizes, then use the data of one or the other when displaying the low or high frequencies...

Sorry for late response but i was looking a bit more to RavenPro and to your quite interesting fft/jitter tutorials on the share pages.

First i have to explain a little more why i want deep horizontal AND vertical resolution : i'm working on additive synthesis and would like to examine deep details in acoustics instruments sounds like bassoon, clarinets, contrabass, etc., to get inspiration on ways to reproduce them in a 200-harmonics-with-blur-factors-additive-synthesis expressive system for a two pens wacom screen. (see below**)

>> Oh, but you want both a great frequency AND a great time resolution.

>> À la fois le beurre et l'argent du beurre. Well, this is simply impossible.

>> It's the audio / wave equivalent of Heisenberg's uncertainty principle.

Only true for standard FFT algorithm, but not necessary true for all time/frequency views in the whole world : Listen to the attached mp3 below, it's played from your fft patch "3-record-play-speed-control". Ok, there is a jitter fft view, and we listen to the sound computed back from the fft view : This is a destructive transformation : the rhythm fidelity is poor (like you said, we have the frequency precision, then we don't have the time precision.) This sound is just "a vague memory of my sound", thus, the graphic view is also, only, a vague memory. At this point i think that, In spite of his uncertainty principle, Heisenberg, would have, rather logically, agree with me, that if a "blurry paté" view is only a vague memory of a sound, then something is missing...

If building any sound by additive synthesis is virtually possible, then there must exist in the universe a way to decompose any sound, without anything missing.

"multi-resolution" in the way that Izotope explain, i'm not sure, but maybe with a 3D matrix, the third dimension representing 12 different fft sizes from 16 to 32768. Then, add or multiply (or something between) the 12 different planes of this third dimension.

Wavelet transform ? On the wikipedia page that i linked above, they say :

"[about fft:] A narrower window gives good time resolution but poor frequency resolution. (...) This is one of the reasons for the creation of the wavelet transform, which can give good time resolution for high-frequency events, and good frequency resolution for low-frequency events."

Thanks Vanille for the link to [wavelet~]. Well, I don't understand how to manage this. to make a sonogram... the only thing i was able to make was a pitch-stretch (in attachment). Does anybody have seen a nice sonogram from wavelet transform ?

Playing with sampling rate ? Well i fell that if you do upsampling, the frequency resolution should go down, and when you downsample, then the frequency resolution goes up ...for the preserved low frequencies. (Plus an other idea for high frequencies then, not sure : highpass filter > freqshift~(down) > downsampling => better frequency resolution for high frequencies too ?)

I'm not sure, maybe a mix between option a and option c. Hum, un peu une usine à gaz... (french expression, literally: i bit like a gas factory)

I was hopping that someone already had a nice solution because it is not that i'm lazy but well, so many things work on. I'm not sure to start on this now, plus i'm not so experienced with jitter.

Anyway, cool to read people interested by this topic,

** About additive synthesis for a good imitation of natural sounds : i want to clearly understand exactly where, in the period of the waveform, is the energy of which frequencies, and how all this change in time during the attack and the sustain of the sound : Trying with my example "funny_additive-synth" ( 

https://cycling74.com/forums/sharing-is-fun-funny-additive-synth-touchosc-bpatcher

 ), i see that the phase information is dramatically important for low frequency instruments (less important on high frequency instruments) Also, I'd like to see, for a soft violin for example, how blurred the harmonics are, and which ones, etc. (additive synthesis from resonators can make some interesting blurred harmonics : mp3 example in: 

https://cycling74.com/forums/sharing-expressive-resonance-on-moving-noises

P.S : "million pixels" view : it was only a way of speaking, i didn't mean "one million frequency bins", i think 16384 or 32768 would be enough.

>> Oh, but you want both a great frequency AND a great time resolution.
>> À la fois le beurre et l'argent du beurre. Well, this is simply impossible.
>> It's the audio / wave equivalent of Heisenberg's uncertainty principle.

Of course i don't agree with this.
Only true for standard FFT algorithm, but not necessary true for all time/frequency views in the whole world : Listen to the attached mp3 below, it's played from your fft patch "3-record-play-speed-control". Ok, there is a jitter fft view, and we listen to the sound computed back from the fft view : This is a destructive transformation : the rhythm fidelity is poor (like you said, we have the frequency precision, then we don't have the time precision.) This sound is just "a vague memory of my sound", thus, the graphic view is also, only, a vague memory. At this point i think that, In spite of his uncertainty principle, Heisenberg, would have, rather logically, agree with me, that if a "blurry paté" view is only a vague memory of a sound, then something is missing...

Wavelet transform ? On the wikipedia page that i linked above, they say :
"[about fft:] A narrower window gives good time resolution but poor frequency resolution. (...) This is one of the reasons for the creation of the wavelet transform, which can give good time resolution for high-frequency events, and good frequency resolution for low-frequency events."
Thanks Vanille for the link to [wavelet~]. Well, I don't understand how to manage this. to make a sonogram... the only thing i was able to make was a pitch-stretch (in attachment). Does anybody have seen a nice sonogram from wavelet transform ?

I'm not sure, maybe a mix between option a and option c.  Hum, un peu une usine à gaz... (french expression, literally: i bit like a gas factory)
I was hopping that someone already had a nice solution because it is not that i'm lazy but well, so many things work on. I'm not sure to start on this now, plus i'm not so experienced with jitter.

Anyway, cool to read people interested by this topic,
Thanks,

** About additive synthesis for a good imitation of natural sounds : i want to clearly understand exactly where, in the period of the waveform, is the energy of which frequencies, and how all this change in time during the attack and the sustain of the sound : Trying with my example "funny_additive-synth" ( https://cycling74.com/forums/sharing-is-fun-funny-additive-synth-touchosc-bpatcher ), i see that the phase information is dramatically important for low frequency instruments (less important on high frequency instruments)  Also, I'd like to see, for a soft violin for example, how blurred the harmonics are, and which ones, etc. (additive synthesis from resonators can make some interesting blurred harmonics : mp3 example in: https://cycling74.com/forums/sharing-expressive-resonance-on-moving-noises )

P.S : "million pixels" view : it was only a way of speaking, i didn't mean "one million frequency bins", i think 16384 or 32768 would be enough.


Well that's a nice opinion to hold - but saying it doesn't make a difference to whether or not you can have both.

There are some ways of getting better resolution (look up LORIS and time-frequency reassignment) - wavelets have their own problems - I'm not an expert on them though, but I know enough to know that they aren't some kind of magic bullet solution to the tradeoff problem.

"If building any sound by additive synthesis is virtually possible, then there must exist in the universe a way to decompose any sound, without anything missing."

Yes - it's called an FFT - if you take an FFT and the do the iFFT then you reconstruct the signal exactly (except for calculation error) - the problem's occur when you try to extrapolate this data further into sinusoidal tracks. THis is the difficult bit - the analysis of the FFT data.

However - sinusoidal tracks aren't going to be enough to get you realistic sound - you probably need to do some kind of noise synthesis too (a la LORIS or ATS.)

You probably need clever tracking algorithms too like the one usd in miller puckette's sigmund~ or the stff explored in Jez Well's PhD - a good peak detection algorithm (possibly with time/frequency correction or reassignment), then some kind of noise analysis/reassignment - a good peak tracker and THEN an additive synth module.

So the long and short of it is this stuff is complicated and more or less at the forefront of what is going on. There isn't anything I'm aware of that is good to go for MaxMSP and of the kind of quality I would be interested in - I'm sure there are guys out there who have their own stuff, or are using LORIS in max or whatever, but that is a big coding project to take on in a lower level language like C. A year ago I started developing some tools that were intended to eventually allow me to do some really good additive synthesis in MaxMSP, but it got too complicated and I've put the project on hold.

I wasn't even at the stage of writing the peak detection algorithm (although I've written simple ones before), or the peak tracker, or look at noise analysis - I was building a framework to allow me to do this kind of processing.

Anyway, it's not really clear how you're going to use the analysis data you get. I'd say you probably don't want a visual anyway, but rather you should use a numeric readout. You might want to check out sigmund~ for a start (it's got some neat tricks for sinusoid detection in it) and start googling the other stuff.

AS for ideas about changing sampling rates I'm just doing this quickly in my head but I think in theory the resolution problem doesn't change - remember that you are representing twice the frequency range:

Window Length = 4096 / 48000 = 8.53....ms

Window Length = 8192 / 96000 = 8.53....ms

You'll see from the above that the two situations are equivalent, just in the second you have twice the representable frequency range.....

Well that's a nice opinion to hold - but saying it doesn't make a difference to whether or not you can have both. 

You probably need clever tracking algorithms too like the one usd in miller puckette's sigmund~ or the stff explored in Jez Well's PhD - a good peak detection algorithm (possibly with time/frequency correction or reassignment), then some kind of noise analysis/reassignment -  a good peak tracker and THEN an additive synth module.

I wasn't even at the stage of writing the peak detection algorithm (although I've written simple ones before), or the peak tracker, or look at noise analysis -  I was building a framework to allow me to do this kind of processing.

Bin width = 24000 / 2048 = 11.71... Hz
Window Length = 4096 / 48000 = 8.53....ms

Bin Width = 48000 / 4096 = 11.71... Hz
Window Length = 8192 / 96000 = 8.53....ms

Pointing again to another crazy FFT thing mentioned by AlexHarker: try a FFT size very very small. Not even 256, maybe just 64. Do you think you will get with signal analyzed with FFT, then re-synthesized with IFFT (like go in/out of a pfft~)? Very poor quality? No! It will be almost perfectly the same. That's the never ending power of Fourier. But, as AlexHarker says, when you want to play with the internal data before re-synthesis, you'll have problems. Also, although the totality of the information is there, even with a very small window size, it doesn't mean that you can OBSERVE both the time & frequency at high resolution.

A way to formulate the uncertainty principle is to say that the more we locate a signal in the time domain, the less we can locate it in the frequency domain, and vice versa.

Since the Uncertainty Principle is so recognized by scientists, if you manage to prove that wrong, you might be eligible for the Nobel prize, no kidding.

Pointing again to another crazy FFT thing mentioned by AlexHarker: try a FFT size very very small. Not even 256, maybe just 64. Do you think you will get with signal analyzed with FFT, then re-synthesized with IFFT (like go in/out of a pfft~)? Very poor quality? No! It will be almost perfectly the same. That's the never ending power of Fourier. But, as AlexHarker says, when you want to play with the internal data before re-synthesis, you'll have problems. Also, although the totality of the information is there, even with a very small window size, it doesn't mean that you can OBSERVE both the time & frequency at high resolution. 

Since the Uncertainty Principle is so recognized by scientists, if you manage to prove that wrong, you might be eligible for the Nobel prize, no kidding.


>> "the more we locate a signal in the time domain,

>> the less we can locate it in the frequency domain, and vice versa."

Ok, i can agree with this sentence, but at a level far beyond the simple FFT Transform. Of course that in a 5-samples sound, it will be hard to find lot's of frequencies. There is a point where this is right but you cannot use that fact to object that i'll just get your blurry paté for dinner.

I think you're king of contradict yourself in these two following sentences you wrote : As the point of FFT is to "play with the internal data before re-synthesis", and as you say we "have problems" doing so, then where is the "never ending power of Fourier" ?

>> try a FFT size very very small. (...) It will be almost perfectly the same.

I noticed this too, so what ? It's not because "the totality of the information is there" that the data -specially phases data- means something for humans, and that we have to grovel to FFT as the ultimate sonogram possible.

By the way, i don't get why my "play-fft-size-4096.mp3" above from your fft patch "3-record-play-speed-control", sounds so different than when i make a simple fftin>cartopol>poltocar>fftout, while i played it at normal speed ? do you know ?

>> sinusoidal tracks aren't going to be enough to get you realistic sound -

>> you probably need to do some kind of noise synthesis too

True we would need an infinite number of sinusoids to make a real nice noise : In fact the better way to do additive synthesis for noisy instruments that have blurred harmonics, like flute or violin, is by doing the opposite: Starting from a white noise, then filtering it with resonators~ (like i did in 

To built the ultimate sonogram where we'll be able to FULLY CLEARLY distinguish noises from pitched content, i guess we would virtually need to mix the data from an infinite number of FFT each one using a different FFT size, not only powers of 2. An equivalent, and maybe more efficient, way of going would be to resample the sound at differents speeds using groove~, and then blend(or multiply?) all the "jitter-FFTs" from each sound.

>> ...sampling rates I'm just doing this quickly in my head but

>> I think in theory the resolution problem doesn't change

You're right! I was wrong for my "option c" above. the sampling rate doesn't change much thing.

Thanks for pointing on sigmund~! It is a pretty nice piece of object. I'm not sure how i could use it to draw a sonogram, but i'll need it for an other application! About LORIS, looks interesting but i'm not into C++ and so i'm not able to try it.

you're probably right here so if my options b and c are gone, i should try "option a". (and using not only powers of 2, as my intuition tells me)

Jean-Francois, if i reach to manage these damned jitter objects, i don't give long life to your damned uncertainty principle...

>> "the more we locate a signal in the time domain,
>>  the less we can locate it in the frequency domain, and vice versa."  

Ok, i can agree with this sentence, but at a level far beyond the simple FFT Transform. Of course that in a 5-samples sound, it will be hard to find lot's of frequencies. There is a point where this is right but you cannot use that fact to object that i'll just get your blurry paté for dinner. 

>> try a FFT size very very small. (...) It will be almost perfectly the same. 

I noticed this too, so what ? It's not because "the totality of the information is there" that the data -specially phases data- means something for humans, and that we have to grovel to FFT as the ultimate sonogram possible.
By the way, i don't get why my "play-fft-size-4096.mp3" above from your fft patch "3-record-play-speed-control", sounds so different than when i make a simple fftin>cartopol>poltocar>fftout, while i played it at normal speed ? do you know ?

>> sinusoidal tracks aren't going to be enough to get you realistic sound -
>> you probably need to do some kind of noise synthesis too

True we would need an infinite number of sinusoids to make a real nice noise : In fact the better way to do additive synthesis for noisy instruments that have blurred harmonics, like flute or violin, is by doing the opposite: Starting from a white noise, then filtering it with resonators~ (like i did in https://cycling74.com/forums/sharing-expressive-resonance-on-moving-noises )
...Then i've got this intuition :
To built the ultimate sonogram where we'll be able to FULLY CLEARLY distinguish noises from pitched content, i guess we would virtually need to mix the data from an infinite number of FFT each one using a different  FFT size, not only powers of 2. An equivalent, and maybe more efficient, way of going would be to resample the sound at differents speeds using groove~, and then blend(or multiply?) all the "jitter-FFTs" from each sound. 

>> ...sampling rates I'm just doing this quickly in my head but
>> I think in theory the resolution problem doesn't change

"As the point of FFT is to "play with the internal data before re-synthesis""

Not necessarily. The FFT can be used for that, but it's purposes are far more general - if you start googling you'll see that many engineers use FFTs as analysis tools in fields that are not even anything to do with sound.

"To built the ultimate sonogram where we'll be able to FULLY CLEARLY distinguish noises from pitched content, i guess we would virtually need to mix the data from an infinite number of FFT each one using a different FFT size, not only powers of 2"

Not really - there are other ways to tell noise from deterministic content (either by phase calculation or lobe width, or by medin filtering etc.)- The problem is nothing to do with the FFT size - the problem is that in a noisy signal peaks appear in single FFT frames that often look almost identical to sinusoidal components, but aren't.

Another problem that no-one has mentioned yet is that a single sine wave will excite all the bins in the FFT to some extent (assuming it is not EXACTLY on a bin frequency) - windowing improves the situation to some extent, by suppressing sidelobes, but it widens the main lobe (which will look like blurring in a sonogram). One of the good things about sigmund~ is that it takes account of this and attempts to correct for it, which leads to more accurate frequency and amplitude values.

"Thanks for pointing on sigmund~ I'm not sure how i could use it to draw a sonogram"

Well I'm not sure you actually want a sonogram, which will almost certainly be blurry to some extent - if you want to know what the sinusoidal components are doing you should be plotting points, or lines rather than spectral data directly (which seems to be too blurry for your tastes). You could build something like this with sigmund~ (in track mode) and jitter. Alternatively you could download gabor and FTM and check out the drawing of spectral data they do, which is a bit like this. It sounds like you wan to plot sinusoidal peaks, not FFT data (like a sonogram) - which will give you precise points, but will ignore any noise components.

Multi-resolution sonograms will only give you a different tradeoff between frequency and time resolution in different frequency ranges (remember the FFT is linear so we generally have not enough frequency resolution at the lower end, and far too much at the top, so a better choice is bigger FFTs for low frequencies (still with poor time resolution), and smaller ones for high frequencies (where we don't need the same linear resolution so better time resolution is preferable).

Not really - there are other ways to tell noise from deterministic content (either by phase calculation or lobe width, or by medin filtering etc.)- The problem is nothing to do with the FFT size - the problem is that in a noisy signal peaks appear in single FFT frames that often look almost identical to sinusoidal components, but aren't. 

Multi-resolution sonograms will only give you a different tradeoff between frequency and time resolution in different frequency ranges (remember the FFT is linear so we generally have not enough frequency resolution at the lower end, and far too much at the top, so a better choice is bigger FFTs for low frequencies (still with poor time resolution), and smaller ones for high frequencies (where we don't need the same linear resolution so better time resolution is preferable). 

Smart people: Can I ask a really really stupid question that falls under the category of "someone must have thought of this?"

filter everything below (Nyquist/2) (take the bottom half of the spectrum)

Ring modulate at (Nyquist/2) (flip over the spectrum)

filter everything below (Nyquist/2)  (take the bottom half of the spectrum)
Ring modulate at (Nyquist/2)  (flip over the spectrum)
Take an FFT
Flip the top half of the FFT to the bass

Nice idea AudioMatt, but the resolution of the FFT is the same across the frequency range in the linear domain. The point is that we perceive frequency in a logarithmic way, so the same resolution down low seems like less resolution...

What you are suggesting wouldn't change the resolution at all so we'd get the same results, just in different bins....

oooohhhhh.  yeeaah...
*hangs head in shame* 

AudioMatt, of course this doesn't work, but i think you were right pointing on ring modulation! :

In fact, i found Izotope RX is far better than RavenPro. In Izotope RX, ok there is this Multi-resolution option in the "spectrogram setting" but it's not the most important, there is some other great stuff like : Time Overlap and Frequency Overlap.

In the image below, from Izotope RX, i compared the same sonogram from "Talk.aiff" using "overlap" :

While time overlap is made by moving a bit the sound in time in front of the FTT window, I think the Frequency Overlap in Izotope RX must be made moving a bit the frequencies in front of the FTT bins... it think it must use some kind of little ring modulation (like freqshift~ does, i think) just moving the sound frequencies some few hertz before doing the FFTs. Then, by blending all the FFTs, this accurate the frequencies of the harmonics. (A bit like i imagined using different FFTs sizes.) I'm not yet satisfied but that's the beginning of something.

>> "Well I'm not sure you actually want a sonogram"

Yes, it IS what I want. I want the sinusoid AND the noise content. I'd like to see with my eyes EVERYTHING that my brain can hear with my ears. I don't feel this is utopian.

if a 256 fft window have good time resolution, and a 8192 fft window have good frequency resolution, i don't see why you guys are not agree that i could have both by intelligently blend them, playing with contrast.

wow.. i'm wondering again about wavelet seeing this :

http://www.youtube.com/watch?v=aRqtZWIirCA

This guy is showing more interesting images made with wavelets than what i had seen before in 

http://books.google.com/books?q=illustrated+wavelet+transform+handbook

Arg, the software is for window, I'm gonna borrow the pc from my girlfriend and have look at it.

Any good example patch using the [wavelet~] object from cnmat, somewhere ?

While time overlap is made by moving a bit the sound in time in front of the FTT window, I think the Frequency Overlap in Izotope RX must be made moving a bit the frequencies in front of the FTT bins... it think it must use some kind of little ring modulation (like freqshift~ does, i think) just moving the sound frequencies some few hertz before doing the FFTs. Then, by blending all the FFTs, this accurate the frequencies of the harmonics. (A bit like i imagined using different FFTs sizes.)  I'm not yet satisfied but that's the beginning of something.

Yes, it IS what I want. I want the sinusoid AND the noise content. I'd like to see with my eyes EVERYTHING that my brain can hear with my ears. I don't feel this is utopian.
if a 256 fft window have good time resolution, and a 8192 fft window have good frequency resolution, i don't see why you guys are not agree that i could have both by intelligently blend them, playing with contrast.

http://www.youtube.com/watch?v=aRqtZWIirCA
http://stevehanov.ca/wavelet/

This guy is showing more interesting images made with wavelets than what i had seen before in http://books.google.com/books?q=illustrated+wavelet+transform+handbook and in the link that Vanille pointed.

http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html

Even this expert can't escape the uncertainty, and he explains it well.

And don't get fooled by pictures, they sometimes show precision which does not exist and doesn't mean anything either. Our ears are as well limited by these rules, and we can fake sound pretty easy.

If you do the correct assumptions, you can get pretty amazing results. Mp3 does work pretty well, though a lot of information of the original signals is just dropped. If you drop the irrelevant, you won't recognize it...

Your picture of resynthesis for example assumes tonal content as the most important. The old dream of additive synthesis...

I got good results by separating tonal from noise components, and only processing the (simplified) tonal part. It was necessary to ignore the noise part and simply mix it in again after processing. But as the noise part would carry most of the perceivable time structure, the results had been promising. Though the tonal aspects had been blurred by the processing, the noise part would still carry the time structure...

Even this expert can't escape the uncertainty, and he explains it well.
And don't get fooled by pictures, they sometimes show precision which does not exist and doesn't mean anything either. Our ears are as well limited by these rules, and we can fake sound pretty easy.
If you do the correct assumptions, you can get pretty amazing results. Mp3 does work pretty well, though a lot of information of the original signals is just dropped. If you drop the irrelevant, you won't recognize it...

Your picture of resynthesis for example assumes tonal content as the most important. The old dream of additive synthesis...
I got good results by separating tonal from noise components, and only processing the (simplified) tonal part. It was necessary to ignore the noise part and simply mix it in again after processing. But as the noise part would carry most of the perceivable time structure, the results had been promising. Though the tonal aspects had been blurred by the processing, the noise part would still carry the time structure...

That's a very nice tutorial on wavelets, Stefan! Thanks for the link -- I hadn't seen that before.

That's a very nice tutorial on wavelets, Stefan!  Thanks for the link -- I hadn't seen that before.


There's something more than time and frequency overlap :

Time-frequency "reassignment" that AlexHarker pointed :

"Compared with the classic spectrogram (aka 'waterfall') display, reassigned spectrograms can offer better resolution in the time- as well as in the frequency domain. (...) by comparing the phase between two neighbouring frequency bins (within the same STFT) it is possible to relocate the energy from that cell along the time(!) axis. By comparing the phase in a frequency bin (between two neighbouring STFTs), it is possible to relocate the energy from that cell along the frequency(!) axis." 

http://www.qsl.net/dl4yhf/speclab/ra_spectrogram.htm

http://www.nbb.cornell.edu/neurobio/land/PROJECTS/ReassignFFT/index.html

"the method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay." 

http://en.wikipedia.org/wiki/Reassignment_method

By checking the "Enable reassignment" box in Izotope RX while using time&frequency overlaps, you can get fine pitch tracking like in the first image below from a singing female voice ("shafqat.aif" cnmat audio example), far better than standard FFT without overlap (2nd image).

The wavelet window software didn't really convinced me about wavelets finally, plus it is damned slow. I find FFT with reassignment and overlaps more precise than wavelets.

(By the way, RavenPro also have time and frequency overlaps option, but it is lost behind hundreds of option, i just found it in "configure spectrogram".)

There's something more than time and frequency overlap : 

"Compared with the classic spectrogram (aka 'waterfall') display, reassigned spectrograms can offer better resolution in the time- as well as in the frequency domain.  (...)  by comparing the phase between two neighbouring frequency bins (within the same STFT) it is possible to relocate the energy from that cell along the time(!) axis. By comparing the phase in a frequency bin (between two neighbouring STFTs), it is possible to relocate the energy from that cell along the frequency(!) axis." http://www.qsl.net/dl4yhf/speclab/ra_spectrogram.htm http://www.nbb.cornell.edu/neurobio/land/PROJECTS/ReassignFFT/index.html

"the method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay." http://en.wikipedia.org/wiki/Reassignment_method

>> "don't get fooled by pictures, they sometimes show precision which does not exist and doesn't mean anything either.

...I laughed when i saw the following image, wondering about this advice from Stefan. Hmm... looks like some ufo entered my signal :

This is a very deep zoom with *128 overlap & reassignment in the spectrogram of a simple [saw~], i'm not kidding!

...I laughed when i saw the following image, wondering about this advice from Stefan. Hmm... looks like some ufo entered my signal :
This is a very deep zoom with *128 overlap & reassignment in the spectrogram of a simple [saw~], i'm not kidding!
:-D

Here's a wavelet image of the same soundfile, shafqat.aif.

Wavelet analysis is pretty good, but indeed slow. Comparing the images, the FFT with reassignment and overlaps appears to offer better precision, espec. in the high frequencies.

I'm curious which FFT software allows for reassignment. Izotope RX, any others?

Here's a wavelet image of the same soundfile, shafqat.aif. 

Wavelet analysis is pretty good, but indeed slow. Comparing the images, the FFT with reassignment and overlaps appears to offer better precision, espec. in the high frequencies. 

is there any way to get a higher resolution image of that saw~ analysis? i think itd be a pretty ace background :D:D

the frequencies coming arcing off the main bulk of energy like a magnetic field are interesting... anyone able to explain this?

But seriously, i'm sure this reassignment method is only the 3/5 in the way to the best sonogram that can be done.

These artifacts produced by the reassignment method could be almost cleared by multiplying some different FFT-variously-sized reassigned sonograms... Because from one FFT size to another, these artifacts are moving... but not the pitched content... (also, this should show a better distinction between pitched and noise content, you see?)

I'm lost in front of the math under this reassignment method. Plus it's kind of slower to compute...

Any interested C-external developers, to make an efficient [jit.reassignedFFT] object from LORIS C libraries ?

I'm also wondering if any voronoi jitter effect could approach this in some way, i asked this on the jitter forum : 

https://cycling74.com/forums/voronoi-like-effect-from-blurred-image

This reassignment method, associated with freq&time overlaps, and associated with the idea of blending different FFT-sizes, could get really cruel with this "Heisenberg's uncertainty principle", and open useful new possibilities, like :

- Very fine polyphonic pitch tracking... There is already the [transcribe~] external, a pioneer, but works really bad. But if you have fine harmonic pitch tracking (see another pitch tracking example in the image/mp3 below), then, even in a polyphonic messy sonogram, one could do greats things : Imagine that you divide vertically the size of the 16384* lines jitter sonogram by 2, by 3, by 4, by 5, etc... harmonics. (considering that most musicals sounds have true harmonics with negligible inharmonicity), intelligently add*multiply all these (antialiased) jitter matrixes ( (H1+H2+H3+H4...) * ((H1+f)*(H2+f)*(H3+f)*(H4+f)*...) where f is a kind of "noise factor" to adjust ) ...and get damned cool, understandable, polyphonic pitch tracking sonogram !

- And maybe - only when we'll have 20 Ghz laptops... - start to dream about the mythical "demixer"...

* (let's say 1024 * 32 frequency overlap = 16384)

I'm lost in front of the math under this reassignment method. Plus it's kind of slower to compute... 

Any interested C-external developers, to make an efficient [jit.reassignedFFT] object from LORIS C libraries ? 

I'm also wondering if any voronoi jitter effect could approach this in some way, i asked this on the jitter forum : https://cycling74.com/forums/voronoi-like-effect-from-blurred-image

This reassignment method, associated with freq&time overlaps, and associated with the idea of blending different FFT-sizes, could get really cruel with this "Heisenberg's uncertainty principle", and open useful new possibilities, like : 

> "By comparing the phase in a frequency bin (between two neighbouring STFTs), it is possible to relocate the energy from that cell along the frequency(!) axis."

you might get some ideas of how to approach this, from here:

https://cycling74.com/forums/getting-out-frequency-from-a-signal

near the end of the thread there is an example on how to calculate the "true frequency".

for the sake of simplicity in this example i've only used a single overlap. better results can be achieved with an overlap factor of 4 or higher.

you might get some ideas of how to approach this, from here:
https://cycling74.com/forums/getting-out-frequency-from-a-signal

near the end of the thread there is an example on how to calculate the "true frequency".
for the sake of simplicity in this example i've only used a single overlap. better results can be achieved with an overlap factor of 4 or higher.
vb


i didn't find the time to go further in this yet, but later i will. thanks for your algorithm.

Thanks volker!
i didn't find the time to go further in this yet, but later i will. thanks for your algorithm.


By the way, when you do this, you assume that the energy in a frequency bin is due to only one sinusoidal component.

>> By the way, when you do this, you assume that the energy in a frequency bin is due to only one sinusoidal component.

sorry, i'm not sure to understand, could you explain more what you mean ?

Hi Jean-Francois,
sorry, i'm not sure to understand, could you explain more what you mean ?


Well, the FFT (or STFT) gives you, in each analysis window, for each frequency bin, first how much energy there is, and second a phase difference. But what it does not give you is the piece of information "how many partials are actually in the original signal in this frequency bin".

For instance, with a FFT size of 512 and a sampling rate of 44100Hz, a frequency bin is about 86Hz wide. You can know that in the frequency bin from 43 to 129 Hz, there is a certain amount of energy. With the formula mentioned, using the phase difference, you can relocate the energy of that cell in the frequency space, and you will maybe find that "the value" is 93 Hz. But what you don't know is IF there is one and only one sinusoidal component. Meaning, that maybe in your original signal, the energy in this frequency bin is made from a component at 67 Hz, and another one at 104 Hz. Or maybe comes from 12 different components. That would be pretty different. When you use the phase difference to calculate a unique frequency, you first assume this frequency is unique.

Extremely precise sonogram ?