Extremely precise sonogram ?
Mar 20, 2010 at 10:07am
Extremely precise sonogram ?(posted on the jitter forum, but it’s for MaxMSP forum) I want an extremely precise sonogram. not just 2 or 3 times more precise than the actual one. i want a million pixels sonogram, with both time and frequency precision, i want to see the perfect fine shapes of each harmonics. not blurry paté… I don’t care if it take 95% of cpu, or even if it needs 2 minutes to compute a 2 seconds visualization. how to reach this ? jitter ? 32time upsampling>fft ? what about wavelet transform* ? Note: I tryed some sonogram software on mac, they weren’t that much prettier than [sonogram] in max… * http://en.wikipedia.org/wiki/Shorttime_Fourier_transform#Resolution_issues 

Mar 20, 2010 at 8:30pm
A million pixels sonogram. Let’s see. Suppose we work with a sampling rate of 44100Hz. 1024 frequency bins over a range of 22050 Hz, that means a FFT size of 2048 samples, i.e. 46ms. Oh, but you want both a great frequency AND a great time resolution. À la fois le beurre et l’argent du beurre. Well, this is simply impossible. It’s the audio / wave equivalent of Heisenberg’s uncertainty principle. But by choosing a nice analysis window size, you can get really nice sonograms, within Max, or with free software like Raven Lite and others. If you keep a window size of 256 for instance, you might not get what you want. 

Mar 21, 2010 at 2:46am
didnt soundhack have like 265,000 frames ? but i dont see how more frequency 

Mar 21, 2010 at 7:16am
Hello Alexandre, http://cnmat.berkeley.edu/patch/4003 for a [wavelet~] external ; i’m not sure you can do DWT time/frequency analyses, as i never used it … but hope that help. 

Mar 21, 2010 at 3:24pm
on a related note, I would absolutely love to know what all this is. Their sonograms look amazing. http://www.izotope.com/support/center/index.php?x=&mod_id=2&id=388 

Mar 21, 2010 at 3:41pm
izotope tools are wonderful. But even they didn’t break the uncertainty principle. For instance, from the page linked by AudioMatt: “AutoAdjustable STFT…if you zoom in horizontally (time) you’ll see that percussive sounds and transients will be more clearly defined. When you zoom in vertically (frequency), you’ll see individual musical notes and frequency events will appear more clearly defined.” Yes, that’s exactly the point, you can get either a good time or a good frequency resolution. You could work on replicating their idea (linking to zoom level) in Max; you could even work on making a “multiresolution” analysis (from their page: “spectrogram with better frequency resolution at low frequencies and better time resolution at high frequencies”): you could calculate the spectrums with two different FFT sizes, then use the data of one or the other when displaying the low or high frequencies… 

Mar 23, 2010 at 3:14am
Sorry for late response but i was looking a bit more to RavenPro and to your quite interesting fft/jitter tutorials on the share pages. First i have to explain a little more why i want deep horizontal AND vertical resolution : i’m working on additive synthesis and would like to examine deep details in acoustics instruments sounds like bassoon, clarinets, contrabass, etc., to get inspiration on ways to reproduce them in a 200harmonicswithblurfactorsadditivesynthesis expressive system for a two pens wacom screen. (see below**) >> Oh, but you want both a great frequency AND a great time resolution. Of course i don’t agree with this. If building any sound by additive synthesis is virtually possible, then there must exist in the universe a way to decompose any sound, without anything missing. option a : “multiresolution” in the way that Izotope explain, i’m not sure, but maybe with a 3D matrix, the third dimension representing 12 different fft sizes from 16 to 32768. Then, add or multiply (or something between) the 12 different planes of this third dimension. option b : Wavelet transform ? On the wikipedia page that i linked above, they say : option c : Playing with sampling rate ? Well i fell that if you do upsampling, the frequency resolution should go down, and when you downsample, then the frequency resolution goes up …for the preserved low frequencies. (Plus an other idea for high frequencies then, not sure : highpass filter > freqshift~(down) > downsampling => better frequency resolution for high frequencies too ?) I’m not sure, maybe a mix between option a and option c. Hum, un peu une usine à gaz… (french expression, literally: i bit like a gas factory) Anyway, cool to read people interested by this topic, Alexandre ** About additive synthesis for a good imitation of natural sounds : i want to clearly understand exactly where, in the period of the waveform, is the energy of which frequencies, and how all this change in time during the attack and the sustain of the sound : Trying with my example “funny_additivesynth” ( http://cycling74.com/forums/topic.php?id=25102 ), i see that the phase information is dramatically important for low frequency instruments (less important on high frequency instruments) Also, I’d like to see, for a soft violin for example, how blurred the harmonics are, and which ones, etc. (additive synthesis from resonators can make some interesting blurred harmonics : mp3 example in: http://cycling74.com/forums/topic.php?id=20980 ) P.S : “million pixels” view : it was only a way of speaking, i didn’t mean “one million frequency bins”, i think 16384 or 32768 would be enough. Attachments: 

Mar 23, 2010 at 2:01pm
“Of course i don’t agree with this.” Well that’s a nice opinion to hold – but saying it doesn’t make a difference to whether or not you can have both. There are some ways of getting better resolution (look up LORIS and timefrequency reassignment) – wavelets have their own problems – I’m not an expert on them though, but I know enough to know that they aren’t some kind of magic bullet solution to the tradeoff problem. “If building any sound by additive synthesis is virtually possible, then there must exist in the universe a way to decompose any sound, without anything missing.” Yes – it’s called an FFT – if you take an FFT and the do the iFFT then you reconstruct the signal exactly (except for calculation error) – the problem’s occur when you try to extrapolate this data further into sinusoidal tracks. THis is the difficult bit – the analysis of the FFT data. However – sinusoidal tracks aren’t going to be enough to get you realistic sound – you probably need to do some kind of noise synthesis too (a la LORIS or ATS.) You probably need clever tracking algorithms too like the one usd in miller puckette’s sigmund~ or the stff explored in Jez Well’s PhD – a good peak detection algorithm (possibly with time/frequency correction or reassignment), then some kind of noise analysis/reassignment – a good peak tracker and THEN an additive synth module. So the long and short of it is this stuff is complicated and more or less at the forefront of what is going on. There isn’t anything I’m aware of that is good to go for MaxMSP and of the kind of quality I would be interested in – I’m sure there are guys out there who have their own stuff, or are using LORIS in max or whatever, but that is a big coding project to take on in a lower level language like C. A year ago I started developing some tools that were intended to eventually allow me to do some really good additive synthesis in MaxMSP, but it got too complicated and I’ve put the project on hold. I wasn’t even at the stage of writing the peak detection algorithm (although I’ve written simple ones before), or the peak tracker, or look at noise analysis – I was building a framework to allow me to do this kind of processing. Anyway, it’s not really clear how you’re going to use the analysis data you get. I’d say you probably don’t want a visual anyway, but rather you should use a numeric readout. You might want to check out sigmund~ for a start (it’s got some neat tricks for sinusoid detection in it) and start googling the other stuff. AS for ideas about changing sampling rates I’m just doing this quickly in my head but I think in theory the resolution problem doesn’t change – remember that you are representing twice the frequency range: For 4096 sample FFT @ 48kHz Bin width = 24000 / 2048 = 11.71… Hz For 8192 sample FFT @ 96kHz Bin Width = 48000 / 4096 = 11.71… Hz You’ll see from the above that the two situations are equivalent, just in the second you have twice the representable frequency range….. Alex 

Mar 23, 2010 at 5:16pm
Hello Alexandre, here a nice introduction to DWT : http://perso.telecomparistech.fr/~rioul/publis/198709meyerjaffardrioul.pdf Audiosculpt from IRCAM ? Sadly that’s not free ; i don’t know how the sonogram is done inside … POST SCRIPTUM : “If building any sound by additive synthesis is virtually possible, then there must exist in the universe a way to decompose any sound, without anything missing.” Don’t forget that in max/MSP we are in the discrete world ; there is no sound nor sinusoïdes in the computer ; just samples ; that’s not FT but DFT, that’s not WT but DWT, that’s not a microscope to see the AtomicTruthofSound, because the sound does not enter inside it ; just a heap of transistors ; IMHO of course. 

Mar 23, 2010 at 5:39pm
Pointing again to another crazy FFT thing mentioned by AlexHarker: try a FFT size very very small. Not even 256, maybe just 64. Do you think you will get with signal analyzed with FFT, then resynthesized with IFFT (like go in/out of a pfft~)? Very poor quality? No! It will be almost perfectly the same. That’s the never ending power of Fourier. But, as AlexHarker says, when you want to play with the internal data before resynthesis, you’ll have problems. Also, although the totality of the information is there, even with a very small window size, it doesn’t mean that you can OBSERVE both the time & frequency at high resolution. A way to formulate the uncertainty principle is to say that the more we locate a signal in the time domain, the less we can locate it in the frequency domain, and vice versa. Since the Uncertainty Principle is so recognized by scientists, if you manage to prove that wrong, you might be eligible for the Nobel prize, no kidding. 

Mar 23, 2010 at 9:30pm
>> “the more we locate a signal in the time domain, Ok, i can agree with this sentence, but at a level far beyond the simple FFT Transform. Of course that in a 5samples sound, it will be hard to find lot’s of frequencies. There is a point where this is right but you cannot use that fact to object that i’ll just get your blurry paté for dinner. I think you’re king of contradict yourself in these two following sentences you wrote : As the point of FFT is to “play with the internal data before resynthesis”, and as you say we “have problems” doing so, then where is the “never ending power of Fourier” ? >> try a FFT size very very small. (…) It will be almost perfectly the same. I noticed this too, so what ? It’s not because “the totality of the information is there” that the data specially phases data means something for humans, and that we have to grovel to FFT as the ultimate sonogram possible. From the AlexHarker remarks : >> sinusoidal tracks aren’t going to be enough to get you realistic sound  True we would need an infinite number of sinusoids to make a real nice noise : In fact the better way to do additive synthesis for noisy instruments that have blurred harmonics, like flute or violin, is by doing the opposite: Starting from a white noise, then filtering it with resonators~ (like i did in http://cycling74.com/forums/topic.php?id=20980 ) >> …sampling rates I’m just doing this quickly in my head but You’re right! I was wrong for my “option c” above. the sampling rate doesn’t change much thing. Thanks for pointing on sigmund~! It is a pretty nice piece of object. I’m not sure how i could use it to draw a sonogram, but i’ll need it for an other application! About LORIS, looks interesting but i’m not into C++ and so i’m not able to try it. >> wavelets have their own problems you’re probably right here so if my options b and c are gone, i should try “option a”. (and using not only powers of 2, as my intuition tells me) JeanFrancois, if i reach to manage these damned jitter objects, i don’t give long life to your damned uncertainty principle… Cheers, 

Mar 24, 2010 at 10:39am
“As the point of FFT is to “play with the internal data before resynthesis”" Not necessarily. The FFT can be used for that, but it’s purposes are far more general – if you start googling you’ll see that many engineers use FFTs as analysis tools in fields that are not even anything to do with sound. “To built the ultimate sonogram where we’ll be able to FULLY CLEARLY distinguish noises from pitched content, i guess we would virtually need to mix the data from an infinite number of FFT each one using a different FFT size, not only powers of 2″ Not really – there are other ways to tell noise from deterministic content (either by phase calculation or lobe width, or by medin filtering etc.) The problem is nothing to do with the FFT size – the problem is that in a noisy signal peaks appear in single FFT frames that often look almost identical to sinusoidal components, but aren’t. Another problem that noone has mentioned yet is that a single sine wave will excite all the bins in the FFT to some extent (assuming it is not EXACTLY on a bin frequency) – windowing improves the situation to some extent, by suppressing sidelobes, but it widens the main lobe (which will look like blurring in a sonogram). One of the good things about sigmund~ is that it takes account of this and attempts to correct for it, which leads to more accurate frequency and amplitude values. “Thanks for pointing on sigmund~ I’m not sure how i could use it to draw a sonogram” Well I’m not sure you actually want a sonogram, which will almost certainly be blurry to some extent – if you want to know what the sinusoidal components are doing you should be plotting points, or lines rather than spectral data directly (which seems to be too blurry for your tastes). You could build something like this with sigmund~ (in track mode) and jitter. Alternatively you could download gabor and FTM and check out the drawing of spectral data they do, which is a bit like this. It sounds like you wan to plot sinusoidal peaks, not FFT data (like a sonogram) – which will give you precise points, but will ignore any noise components. Multiresolution sonograms will only give you a different tradeoff between frequency and time resolution in different frequency ranges (remember the FFT is linear so we generally have not enough frequency resolution at the lower end, and far too much at the top, so a better choice is bigger FFTs for low frequencies (still with poor time resolution), and smaller ones for high frequencies (where we don’t need the same linear resolution so better time resolution is preferable). A. 

Mar 24, 2010 at 1:02pm
Smart people: Can I ask a really really stupid question that falls under the category of “someone must have thought of this?” filter everything below (Nyquist/2) (take the bottom half of the spectrum) Instant awesome resolution in bass? 

Mar 24, 2010 at 2:51pm
Nice idea AudioMatt, but the resolution of the FFT is the same across the frequency range in the linear domain. The point is that we perceive frequency in a logarithmic way, so the same resolution down low seems like less resolution… What you are suggesting wouldn’t change the resolution at all so we’d get the same results, just in different bins…. A. 

Mar 24, 2010 at 3:04pm
oooohhhhh. yeeaah… :) 

Mar 24, 2010 at 11:12pm
AudioMatt, of course this doesn’t work, but i think you were right pointing on ring modulation! : In fact, i found Izotope RX is far better than RavenPro. In Izotope RX, ok there is this Multiresolution option in the “spectrogram setting” but it’s not the most important, there is some other great stuff like : Time Overlap and Frequency Overlap. In the image below, from Izotope RX, i compared the same sonogram from “Talk.aiff” using “overlap” : While time overlap is made by moving a bit the sound in time in front of the FTT window, I think the Frequency Overlap in Izotope RX must be made moving a bit the frequencies in front of the FTT bins… it think it must use some kind of little ring modulation (like freqshift~ does, i think) just moving the sound frequencies some few hertz before doing the FFTs. Then, by blending all the FFTs, this accurate the frequencies of the harmonics. (A bit like i imagined using different FFTs sizes.) I’m not yet satisfied but that’s the beginning of something. >> “Well I’m not sure you actually want a sonogram” Yes, it IS what I want. I want the sinusoid AND the noise content. I’d like to see with my eyes EVERYTHING that my brain can hear with my ears. I don’t feel this is utopian. SOMETHING ELSE : wow.. i’m wondering again about wavelet seeing this : http://www.youtube.com/watch?v=aRqtZWIirCA This guy is showing more interesting images made with wavelets than what i had seen before in http://books.google.com/books?q=illustrated+wavelet+transform+handbook and in the link that Vanille pointed. Arg, the software is for window, I’m gonna borrow the pc from my girlfriend and have look at it. Any soft like that for mac ? Any good example patch using the [wavelet~] object from cnmat, somewhere ? [attachment=128127,273] 

Mar 25, 2010 at 11:25am
A nice tutorial on wavelets: http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html Even this expert can’t escape the uncertainty, and he explains it well. Your picture of resynthesis for example assumes tonal content as the most important. The old dream of additive synthesis… Stefan 

Mar 26, 2010 at 3:09am
That’s a very nice tutorial on wavelets, Stefan! Thanks for the link — I hadn’t seen that before. 

Mar 29, 2010 at 9:54pm
There’s something more than time and frequency overlap : Timefrequency “reassignment” that AlexHarker pointed : “Compared with the classic spectrogram (aka ‘waterfall’) display, reassigned spectrograms can offer better resolution in the time as well as in the frequency domain. (…) by comparing the phase between two neighbouring frequency bins (within the same STFT) it is possible to relocate the energy from that cell along the time(!) axis. By comparing the phase in a frequency bin (between two neighbouring STFTs), it is possible to relocate the energy from that cell along the frequency(!) axis.” http://www.qsl.net/dl4yhf/speclab/ra_spectrogram.htm http://www.nbb.cornell.edu/neurobio/land/PROJECTS/ReassignFFT/index.html “the method of reassignment sharpens blurry timefrequency data by relocating the data according to local estimates of instantaneous frequency and group delay.” http://en.wikipedia.org/wiki/Reassignment_method By checking the “Enable reassignment” box in Izotope RX while using time&frequency overlaps, you can get fine pitch tracking like in the first image below from a singing female voice (“shafqat.aif” cnmat audio example), far better than standard FFT without overlap (2nd image). The wavelet window software didn’t really convinced me about wavelets finally, plus it is damned slow. I find FFT with reassignment and overlaps more precise than wavelets. (By the way, RavenPro also have time and frequency overlaps option, but it is lost behind hundreds of option, i just found it in “configure spectrogram”.) [attachment=128487,297] [attachment=128487,298] 

Mar 29, 2010 at 10:20pm
>> “don't get fooled by pictures, they sometimes show precision which does not exist and doesn't mean anything either. …I laughed when i saw the following image, wondering about this advice from Stefan. Hmm… looks like some ufo entered my signal : [attachment=128488,299] 

Mar 29, 2010 at 10:22pm
Here's a wavelet image of the same soundfile, shafqat.aif. Wavelet analysis is pretty good, but indeed slow. Comparing the images, the FFT with reassignment and overlaps appears to offer better precision, espec. in the high frequencies. I'm curious which FFT software allows for reassignment. Izotope RX, any others? [attachment=128489,300] 

Mar 29, 2010 at 10:40pm
is there any way to get a higher resolution image of that saw~ analysis? i think itd be a pretty ace background :D:D the frequencies coming arcing off the main bulk of energy like a magnetic field are interesting… anyone able to explain this? 

Mar 30, 2010 at 12:04am
But seriously, i’m sure this reassignment method is only the 3/5 in the way to the best sonogram that can be done. These artifacts produced by the reassignment method could be almost cleared by multiplying some different FFTvariouslysized reassigned sonograms… Because from one FFT size to another, these artifacts are moving… but not the pitched content… (also, this should show a better distinction between pitched and noise content, you see?) I’m lost in front of the math under this reassignment method. Plus it’s kind of slower to compute… Any interested Cexternal developers, to make an efficient [jit.reassignedFFT] object from LORIS C libraries ? I’m also wondering if any voronoi jitter effect could approach this in some way, i asked this on the jitter forum : http://cycling74.com/forums/topic.php?id=25579 This reassignment method, associated with freq&time overlaps, and associated with the idea of blending different FFTsizes, could get really cruel with this “Heisenberg’s uncertainty principle”, and open useful new possibilities, like :  Very fine polyphonic pitch tracking… There is already the [transcribe~] external, a pioneer, but works really bad. But if you have fine harmonic pitch tracking (see another pitch tracking example in the image/mp3 below), then, even in a polyphonic messy sonogram, one could do greats things : Imagine that you divide vertically the size of the 16384* lines jitter sonogram by 2, by 3, by 4, by 5, etc… harmonics. (considering that most musicals sounds have true harmonics with negligible inharmonicity), intelligently add*multiply all these (antialiased) jitter matrixes ( (H1+H2+H3+H4…) * ((H1+f)*(H2+f)*(H3+f)*(H4+f)*…) where f is a kind of “noise factor” to adjust ) …and get damned cool, understandable, polyphonic pitch tracking sonogram !  And maybe – only when we’ll have 20 Ghz laptops… – start to dream about the mythical “demixer”… * (let’s say 1024 * 32 frequency overlap = 16384) [attachment=128504,301] Attachments: 

Mar 30, 2010 at 3:48pm
> “By comparing the phase in a frequency bin (between two neighbouring STFTs), it is possible to relocate the energy from that cell along the frequency(!) axis.” you might get some ideas of how to approach this, from here: http://cycling74.com/forums/topic.php?id=22200 near the end of the thread there is an example on how to calculate the “true frequency”. 

Apr 3, 2010 at 3:33pm
Thanks volker! 

Apr 3, 2010 at 6:38pm
By the way, when you do this, you assume that the energy in a frequency bin is due to only one sinusoidal component. 

Apr 3, 2010 at 8:07pm
>> By the way, when you do this, you assume that the energy in a frequency bin is due to only one sinusoidal component. Hi JeanFrancois, 

Apr 4, 2010 at 9:22pm
Well, the FFT (or STFT) gives you, in each analysis window, for each frequency bin, first how much energy there is, and second a phase difference. But what it does not give you is the piece of information “how many partials are actually in the original signal in this frequency bin”. 

Apr 4, 2010 at 10:56pm
Thanks for your explanation. Then – i hope i will find the time to manage jitter objects to show you an experience of this, right now i don’t – if you BLEND all the results of these FFT sizes, you will, i think, go over these disadvantage of FFT that you are talking about, and see noise as more or less – filled areas, and see clear “lines” that really are pitched content. Demo version of Izotope RX is here : http://www.izotope.com/products/audio/rx/download.asp (all sonogram options are in “spectrogram setting”) 

Aug 7, 2010 at 6:59pm
I just began to read this very interesting topic! 0/ Is there some modified version of FFT, in that instead of dividing the frequency axis in a “linear” way into “bins”, the frequency axis is divided in “bins” with a log scale ?? 1/ If I understand well, MultiresolutionProcessing is “only” 2/ Are you really sure that if we upsample, we cannot get more precise resolution ??? I’m really not sure !!! Someone gave this example: I’m okay, but what about this : So by upsampling by 16, we can have a 16 times better frequency resolution with the same window length !!! (But in this trick, we do the assumption that we FORGET what’s in the signal above 24khz) : *For 65536 sample FFT @ 768kHz 3/ Alexandre, did you find some solutions for your problem ? Is it possible to find such a High Resolution sonogram ?? Thanks, Jebb 

Aug 7, 2010 at 7:27pm
Hi Jebb, That was me, and the maths is correct. Unfortunately the FFT does not care what you can and can’t hear. What is important is the nyquist frequency – this has nothing to do with hearing – the FFT has applications far outside of audio – it is in itself a mathematical tool for doing certain things. The wavelet transform is more or less a specialised constant q filterbank – it works somewhat like the log scale FFT you propose above. It’s already been mentioned here and it has issues of its own. There are many papers available on the web about uptodate spectral analysis and processing techniques. They are written by engineers with a very good grasp of the maths and theory behind these things and some of them get very complicated. My take on this thread now (as when it started) is that it seems unlikely that posters here who do not have a firm understanding of the basics of spectral analysis and techniques will come up with tools or techniques that are better than those created by experts in the field. If you guys really want to get into this in detail then you are probably going to need to read a *LOT*, get very good at maths and roll your own externals in C or java. The time input to get serious with this is going to be very large indeed. Good luck, Alex 

Aug 7, 2010 at 8:15pm
Hi AlexHarker, Of course your math is correct! I did not say the contrary at at all ;) I just asked : what happens if we make a further hypothesis, ie “forgetting” about the freq above 24khz? It is always possible to do a FFT with : This is possible, we only store in the matrix what interests us, and I make the *assumption*, that we don’t want to store anything above 24khz. The *right* question is rather : is it clever or not ? ie : what will happen when we do the ****inverse FFT**** ?? Will the *data lost* (we have made the assumption to forget about them) above 24khz (with a sampling rate of 768khz) make enormous distorsion ? How will it sound ? Far from the original signal or not ? This is the question ! Do you have an idea ? Jebb 

Aug 7, 2010 at 9:32pm
I am sorry – I do not believe you are correct. In order to carry out your method I have to “only store in the matrix what interests us”. Please explain to me how you do this? At best you have proposed some kind of alternative way of thinking about zeropadding – however, this leads to spectral interpolation, which is not at all the same as true resolution… A, 

Aug 7, 2010 at 11:01pm
It depends what you want to do. About this topic : you only want a very precise sonogram. Let x be the signal, w the window function. It is possible to calculate values S(m,omega)^2 for all values of omega, even if there are very close to each other ? By increasing the sample rate (the number of n’s increases, and the windows w(n) changes according to the sample rate, in order to keep a constant window length in milliseconds), we can compute much more values (I’m speaking just about plotting a sonogram.) 

Aug 7, 2010 at 11:39pm
You describe two scenarios there (from a quick reading). 1 – calculating values for frequencies that do not fit into the sample length an integer number of times. This equates to zeropadding (at least if we start by doing half way between the bins, then halfway again and so on) . If I zero pad the data to twice the size I will be calculating the half way points between bins in the previous size for instance. This gives ideal interpolation, which may result in a clearer/nicer sonogram (yes in this way you can keep reducing the bin width), *but* it is not the same as true resolution, because you cannot use this method to resolve closely spaced sinusoids: So – you have interpolated your data nicely, but you have not gained additional information about the signal in certain ways this will be “more precise” at least to the eye, but the critical information is already encoded in the FFT data, and for many applications (such as more accurate location of spectral peaks) there are well known techniques of deriving the information (such as parabolic interpolation). https://ccrma.stanford.edu/~jos/st/Zero_Padding_Applications.html 2 – When you increase the sample rate you also increase the representable range (the nyquist frequency increases), so the bin width for the same size fft decreases. Double the sampling rate – nyquist doubles Zeropadding doesn’t seem to have come up in this thread so that’s a useful addition, although it’s a very well known technique – implementing it in msp is pretty straightforward. In regards to oversampling I repeat what I have said earlier – you cannot gain frequency resolution by oversampling. Alex 

Aug 8, 2010 at 2:04am
1/ Thanks Alex, I begin to see what you mean now… :) 2/ Is there a MAX/MSP patch that plots *beautiful* color sonograms ? 3/ A general question about FFT. Let’s say I have a 1second long mono 96khz wav file containing a pure 440hz sine. 

Aug 8, 2010 at 7:39am
1 – Unfortunately I don’t think you’ll see any improvement at all 3 – The FFT data will be spread across more than one bin, except in the situation in which the sine wave is tuned *exactly* to the centre of a bin, and no windowing is applied (a rectangular window is used). This situation results in only one FFT bin being excited, but is a fairly useless one, because when the sine wave is not tuned to the centre of a bin, many other issues occur and the leakage is bad – windowing is a way of compromising size of the central lobe with the amplitude of sidelobes, resulting in a slightly enlarged central lobe, but suppressed sidelobes. Whether this is visible as more than a single pixel depends on the exact method of drawing the data, but that is indeed why you see a hazy cloud around the partials in the plots above. If the display is not a sonogram, but rather the plot of a partial tracking algorithm (such as in SPEAR) then it is possible for the sine wave to appear as a single line – however, this type of display will not deal with noise content well. 

Aug 8, 2010 at 10:13am
Thanks AlexHarker for your answer. 1 and 2 Yes I begin to understand now… But in order to convince me / do some tests / learn more about that, I’d like to do tests myself. Would someone have some code that draws nice sonogram (in Matlab or MAX or something like that) ? ( Of course, a code that does not just call “Sonogram” or fft routine) … 3 So if I understand well, “in the real life” (we can forget about the special case of the sine freq which is is in the centre of a bin), the STFT will quite NEVER give a nice ‘line’, even if the signal is a constant sine ! There always will be some pâté ! This is very interesting to notice that, once for all! Thanks a lot ! 4 So the good solution would be : 

Aug 9, 2010 at 1:16am
Great thread, lots to think about. Love the images too, especially the saw~ one. Would love a highres version of that too, or a collection of similar ones (hint, hint…) Was wondering if there’s any way to put poly~ to work on this, maybe some way to divide up the calculations across multiple poly~ patches? Could each one do (say) a fourth of the bins, or do some other trick to speed up the processing? If you had a quadcore processor and could split up the work, that would really move things along. Though maybe it’s not feasible, and you need to do everything in one process. Just a thought, am curious if I’m on the right track or way off… 

Aug 9, 2010 at 7:23pm
>> jebb said: Sorry i didn’t find the time to go over all this, plus AlexHarker is right that it should better involve some C or Java (at least for the REASSIGNMENT algorithm.) >>But this oversampling technique … That was Alexandre’s goal I think. No! it was just a useless idea i thought about at the beginning of this threat. As AlexHarker pointed many times in the threat – Thanks for your patience, Alex :) – and explained to us, beginners in fft, OVERSAMPLING in itself will NOT increase resolution in the FFT. >> Is it possible to find such a High Resolution sonogram ?? I’m sure it is. When i said above: “this reassignment method is only the 3/5 in the way to the best sonogram that can be done.” i should have said the 1/5… We should think about pixels in sonograms as “probabilities” for a frequency to be there. As many guys pointed above, at a special instant of sound, at a special sample, there is no frequencies at all, we can only guess a probability for that frequency to be around. A sonogram is nothing real or physical. It is just an imagination of a sound. Exactly like our brain treatment, when we listen to music and sounds, is. So, looking for “the deep truth of the signal” is not the good manner to think about sonograms. The only physical truth is the signal itself. So what i dream as an “extremely precise sonogram” is just something that approch the “special treatment of probabilities” that my brain is doing when i listen to sounds and music. The “REASSIGNMENT method” of Izotope RX, used with *32 X and Y overlaps, is far better than standard FFT, but it’s still far from ideal because of the spiderweblike artifacts produced everywhere… >>…be able to show just a 1secondlong line of 1 pixel of width at 440hz, but instead, there will always be artefacts. Using a bunch of mixed sonograms using the “reassignment method”, It will be possible, because, again, the artefacts are MOVING when you change the fft size : > seejayjames said: hehe, you americans like ufos! Here below is a notsomuchmore highres of the ufos. You can make your own using the free trial of RX: http://www.izotope.com/products/audio/rx/download.asp This is not a real “saw” but a kind of dephased saw made from cosines instead of sines* (made using that : http://cycling74.com/forums/topic.php?id=25102 ) (*the first second of the sound attached below) > Was wondering if there’s any way to put poly~ to work on this Go ahead! 1 Timeoverlap is already an option in the fft objects in max, but not Frequencyoverlap : Creating it by shifting a bit the sound, 16 or 32 times (using [freqshift~], maybe) should work, i think. (shifting amount = Bin width divided 16 or 32) Then interleave all the 16 or 32 FTTs in one jitter matrix. 2 Find the way to apply the “Reassignment method” on this, either writing a C external object using LORIS: http://www.hakenaudio.com/Loris/ (maybe even swap the first step and process all the FFT inside the external), or using the example from volker: http://cycling74.com/forums/topic.php?id=22200 (patch near the end of the thread) 3 At this point, i think each poly~ should be used to compute a different sonogram using a different FFT window size. And some of the poly~ should also use a resampled sound, like i explained above: it will be equivalent to nonpowerof2 fourier windows sizes: Examining reassigned sonograms in RX, my feeling is that powers of 2 for window sizes are not enough to clear completely the artifacts. 4 Simply mix all the sonograms together. The more sonograms you compute from different FFT sizes, the more clean the result, i think. Note: Even you have a 24core cpu, i think you’ll stay far from a real time sonogram from adc~… [attachment=138533,942] Attachments: 

Aug 9, 2010 at 11:36pm
1 – The so called frequency overlapping in the izotope RX package seems to me a misnomer, as it is not as far as I can see really overlapping anything – it is clearly explained as zero padding by the hint, andas such results in spectral interpolation as outlined above. If you want to emulate the effect then do zero padding – don’t shift the sound around – that’s just reducing accuracy, and if you want to do averaging to improve the look of things, do it directly on the power spectrum by convolving with a small kernel. 2 – Multiresolution is generally used to use more appropriate fft sizes for different frequency ranges, not to average spectra – averaging spectra from different FFT sizes will in my view *reduce* accuracy, because the frequency resolution of the lower fft sizes will be poorer and hence spectral peaks will be poorly located. 3 – The reassignment thing definitely looks dramatic. Personally, however, I think if you really want sine waves as nice lines track the peaks using a bit of zeropadding and parabolic peak interpolation (easier to do than the reassignment method) and plot them as lines – subtract from the fft and plot the noise residual as a standard sonogram if you want that too. If you want to be really clever then you can increase the accuracy of the peak finding by subtracting peaks from the entire spectrum as you go in order of highest magnitude first so that the effect of nearby peaks on one another is reduced (ie – locate largest peak – remove – then repeat process). That’s what happens inside sigmund~ – it’s very neat, very clever and a little bit complicated, but the source is available – in order for it to work you have to subtract the correct shape, which miller has calculated according to the way he does his fft / windowing. He also does some other neat stuff, like windowing in the frequency domain using convolution so that he can examine the raw fft alongside the windowed one, without needing to do 2 separate ffts…. 4 – Part of the problem with moving noise with different settings is probably to do with the high variance of the power spectrum estimate from a direct or singly windowed fft. You could try to improve this using this technique: http://en.wikipedia.org/wiki/Multitaper I wouldn’t expect this to tighten up your spectral peaks at all, but it might allow tighter timing resolution, or reduce noisy atifacts to some extent – although in terms of a sonogram, the visibility of these is highly dependent on the scaling of the values (I can get a lot of the background noise to disappear in the above organ example, simply by adjusting sonogram scaling settings in izotope RX). Alex 

Aug 10, 2010 at 3:41pm
Well, thanks again for your comments and tips. > 1  > 2  > 3  > 4 – http://en.wikipedia.org/wiki/Multitaper A bit out of the subject, i wanted to say again that sigmund~ is a really nice object. Perhaps, the polyphonic pitch tracking idea that i tried to describe above (“harmonically multiply” everything together) could be better achieved using sigmund~. * theses “blurred” harmonics are made using that: http://cycling74.com/forums/topic.php?id=20980 [attachment=138610,951] 

Aug 21, 2010 at 4:40pm
What about this method for obtaining an extremely precise sonogram of a constant pitch note : 1/ We want to determine an extremely precise sonograme at time t0 ? (I agree this makes no sense in general : as someone told, at a given “frozen” time, there is no sense it trying to know which frenquencies there are) 2/ An algorithm can find a “full period” around time t0. Juste one period. Then we copy this period in order to do a full periodic function (infinte in time). 3/ Then we can do a FFT with infinitely precision, because we can take a window as large as we want (as the signal is now replicated infinitely in time). This could be useful to study sounds where we easily see they are rather periodic… What do you think ? jebb 

Aug 24, 2010 at 11:06pm
I’ve often thought about things you might be able to do with the new autotune patch. If you retune the signal to the FFT period, then retranspose the display back up, I wonder if it looks any better 
You must be logged in to reply to this topic.