Faster-than-realtime sonogram creation in Jitter?

antwan's icon

Hi there!

Although I've got quite a lot of mileage with Max, I'm somewhat new to both Jitter and spectral processing. I've been studying the J-F Charles spectral/jitter patches closely but this question I'm left pondering:

Is there a way to have Jitter read an audio buffer and draw a sonogram faster-than-realtime?
The only thing that comes to my mind is an upsampling poly but I'm interested in any/all techniques that more jitter-aware folks might have in their back pocket.

Any pointers greatly appreciated!

antwan

Jean-Francois Charles's icon

Hi Antwan,
You can surely achieve this with [jit.fft] but it's not trivial: in addition to the fft itself, you also want to perform the windowing (hanning or else) and the overlap add...
You're right, that would be great to have such a converter!

antwan's icon

Hi there,

And first off: a personal thanks for your great tutorials!

Attached is a "proof-of-concept" patch regarding using an upsampling poly for faster-than-realtime sonogram creation. But I'm definitely interested in the jit.fft idea. Problem is I seem to be quite lost on how to use jit.fft. Do you have any pointers for me to get started, if I may ask?

Indeed it would be lovely to get this working in an somewhat optimal way.

Thanks!
a

2759.jittersonogramuppoly.zip
zip
AlexHarker's icon

Hey,

So you shouldn't do this in the audio domain at all- use an uzi/jitter solution. I am attaching two patches that actually do buffer convolution in jitter this way.

Unfortunately this is a little complicated and I don't have time to comment it. Also it doesn't do windowing and uses zero-padding. On the plus side it shows you how to loop over a buffer using jit.fft and do some processing.

The Offline FFT patch will hopefully be a useful starting point - you need to add windowing though, and maybe lose the zero-padding. The other patch shows one way of applying it...

Alex

2761.OfflineConvolution.maxpat
Max Patch
antwan's icon

Hi,

And thanks Alex for chiming in!
So I'm gonna have to take this somehow reeeeaaaal slow and step-by-step... cause somehow this move to jitter-land and jit.fft got my head all confused. I hope you guys can bare with me.

Could you confirm is this patch (pasted below) essentially doing the fft analysis on the first 4096 sample window of a buffer - still without windowing - and storing that info in the first column of a target matrix?

"Baby steps towards greater understanding" is my motto of the day.

Thanks so much for your help and patience.

Max Patch
Copy patch and select New From Clipboard in Max.

a

AlexHarker's icon

So - looking mostly good:

1 - You should do a 4096 point fft on 4096 points (not a 2048 point one).

2 - You will get a full complex fft (like with the fft~ object)

3 - You are actually outputting 4097 points from jit.buffer~ - for 4096 points you need outputlast 4095

good luck!

antwan's icon

Thanks Alex!

Oh yeah, point 3 I had actually already realized but for some reason it was wrong in this example patch.
Point 1-2: So then I'm getting 4096 bins of data and hence the target matrix should also be 4096 high in this example?

If I were then to progress towards windowing and overlap. Just to make sure I've got the theory behind it right:
- If I have a window size of 4096 and overlap of 2, I'd first send to the fft 0-4095 (treated with the windowing) then 2048-6143, etc.
- Continuing with 4096 window and an overlap of 2 - If my total buffer length were say 16384 (4*4096) I'd get 8 frames worth of data?

Thanks again,

a

AlexHarker's icon

Yes, yes and yes.

All this seems correct.

A.

antwan's icon

Cheers!
I'll try and make it forth from here.

a

antwan's icon

Hello again,

Here's what I think would be a test patch for jit.fft analysis with window size 4096 / overlap of 2 with hanning windowing.

Somehow though it's not giving the results I'm expecting. If you look at the pwindows after the analysis, the left one does indeed show some data, but this seems to be only the phase data (?) because the right hand one (which should be showing the amplitude, as far as my understanding goes) appears empty.

If I could bother you once again (sorry!) to peek inside the patch and see if you understand where it might be going wrong?

Also any other optimizations or observations you may have are definitely of interest to me. For instance, if I feed it a longer buffer - say 1 min long - it gives me the beach-ball-freeze for the duration of the analysis, which doesn't feel nice. I wonder if that can be bettered.

Thanks, once again, for any help!

Max Patch
Copy patch and select New From Clipboard in Max.

a

Jean-Francois Charles's icon

I might try to have a look, but I don't have Max running on my machine here. Thanks Alex for your tips (and for the convolution workshop in NYC, but that's another story). Yes, because of the windowing, and also of the necessity to treat the full FFT given by jit.fft, it might be easier to just use pfft~, depending on your timeline with your project :
- if you need to work with pre-recorded files, you could program a routine opening each file in a folder, generating its sonogram (amplitude and phase) at normal audio speed, and saving the result as a jitter matrix file (.jxf). You could even run such a patch with an "offline" audio driver, thus achieving faster than real time analysis.
- if you need to work with buffers that you record " on the fly " , you could at the same time you record it in time domain in a buffer, record it as a " sonogram " .

All best,

antwan's icon

Hi there!

Yes I carefully considered all the options you described but came to the conclusion that - if I get to work - this all-jitter version might very well be most versatile for buffer based operations because the jxf files quickly become absolutely huge. In a scenario - for sake of example - where I might want to do the analysis on any one of, say, 200 long audio files or a certain part of a file, it would make most sense to be able to do it quickly "inline" as needed rather than to fill the hard disks with jxf's ten times the size of each equivalent audio file.

So if at a later time you have a chance to check if you can see what's wrong with the patch I'd seriously appreciate it.

Btw, I'm not entirely sure what you mean with " the necessity to treat the full FFT given by jit.fft". I hope I understood all I need to take care of when working with jit.fft. But I'm sure his will become clear when you see the patch.

Thanks for the continuing help & support!

a

AlexHarker's icon

So,

1 - full FFT - you get twice the number of bins (roughly) that you need for viewing a real FFT (you get all the negative freqs as well, but these replicate positive freqs but with the imag part reversed in polarity). You should only display the first N/2 +1 bins.

2 - This is going to be slow for a big buffer - it's a lot of processing. You can defer/deferlow to possibly have a more responsive max, but it certainly won't make it faster.

3 - I *may* be making an external to do this soon - possibly quite soon, but no promises on that or a timescale...

Will look at the patch in a minute, but I don't have a couple of the objects... (externals)

Out of interest what is the final application?

A.

Pierre Alexandre Tremblay's icon

If there is a lot of demand (apart from me ;-) I might try to motivate Alex on this... so any pressure welcome (as well as bug reports on the convolution/IR suite)

p

antwan's icon

@Pierre
Sure, pressure pressure! :)

@Alex
Sorry I noticed I was using an own abstraction in my pasted patch. Here's one without it, so that the only external/abstraction in use is jit.cv.cartopol. I tried to have a go at ignoring the second half of bins but with no better results so otherwise this patch is the same as earlier.

Can't thank you guys enough!

Max Patch
Copy patch and select New From Clipboard in Max.

a

AlexHarker's icon

OK. So - you don't need cv.jit.cartopol, you can use two instances of jit.op (hypot / atan2), although you may need to flip imag/real parts in order to correctly calculate phase.

Anyway, as far as I can see this works just fine, but the numbers you get for amplitude are probably much smaller than you are expecting. I had to spill into a multislider and set the range smaller to see them.

A.

antwan's icon

Cheers!
Good to hear it's not all entirely screwed up!

Few questions:
1) "the numbers you get for amplitude are probably much smaller than you are expecting"
Now that you pointed it out for me, I see it too. Why is that and what would one do so they are "exactly what I'm expecting"? :)

2) Re: "you get twice the number of bins (roughly) that you need for viewing a real FFT"
To fix this would one for example create a jit.matrix of size 2048 - with srcdimstart 0 and srcdimend 2047 - between the jit.fft and the jit.unpack?

Thanks - once again. I'd definitely buy you a several beers if I could!

a

AlexHarker's icon

1) You may want to use a logarithmic mapping to db before you display. You should also try with a single sine wave on an exact bin frequency. That should be 1. in amplitude - otherwise you need to redo your scaling somewhere.

2) Yes - but better still use 2049 bins to get the nquist bin too (there are N/2 +1 independt bins in a real FFT.

A.

antwan's icon

Hello fellows,

I think I'm (still) in great need of help to get this right.

Here's what I've tried:

1)
I did a few tests with sine waves at a few of the (AFAIK) bin frequencies and I was getting for example 0.245448 in that bin and 0.122904 in the two neighbouring bins - or in another case 0.176725 in the expected bin and 0.088395 in the two neighbouting bins.

2)
I also was doing idiot tests like putting the amplitude values from the [jit.op @op hypot] through a [jit.op @op * @val 100.] and still got absolutely nothing visible in the so called sonogram jit.pwindow (on bottom right).

Here's what I'm hoping:

The reason I'd really like to get this "officially" spot-on is that eventually I'm hoping this wont only give me a decent sonogram of a buffer or a selected part of a buffer but that it would eventually also create data for a working inverse fft with [jit.fft @inverse 1] - so that I could do spectral transforms on the data and bring it back into the audio domain.

I've read up quite a bit on FFT during this process and the problem mainly is that the jit.fft simply isnt giving me what I'm expecting to get so I'm quite lost on what to do to set it right.

Below is the patch updated for the 2049 bin target matrix.
(Btw, I've noticed that when saved to disk and opened the windowing matrix's loadbang doesn't cut it, so that loadbang needs to be manually re-triggered... don't know why that is either)

As always before - I'd be infinitely grateful for any soul ready to take my hand and guide me down this dark path of complex numbers. Thanks again!

Max Patch
Copy patch and select New From Clipboard in Max.

a