Spectrum of a buffer/audio file

woyteg's icon

Hi!
Any ideas how I could obtain the spectrum of an entire audio file as fast as possible?
By the spectrum of the file, I mean the average value of each bin, over the entire file, for example.
In order to be faster than realtime, I could imagine upsampling would be an option, or using uzi to retrieve the individual signal vectors as fast as possible..hm.. what do you guys think?
Very much appreciating any hints,
all the best

vichug's icon

hm, would not jit.buffer~ make the magic ? this guy made a dissertation with a lot of audio-to-jitter : https://cycling74.com/forums/share-sonographic-sound-processing-diss/ this is overkill in your case, but i think inside this you can find what you wish for, no ?

vichug's icon

hm i'm not sure of what i'm saying anymore. this guy apparently doesn't use jit.buffer~...

pdelges's icon

Another way may be to use ircam's free MuBu library.

vichug's icon

still jit.buffer~ seems like a track, wiht jit.fft you could manually slice the output of a jit.buffer~ into spectral frames and...

edit :indeed MuBu seems a better idea

woyteg's icon

thanks guys! Both of these are solutions I wouldn't have thought of (and didn't know this library),
happy I asked. I gonna look into this..
cheers!

woyteg's icon

hi again,
I just downloaded the library and it really seems *very* impressive. Have you guys any hint how to start with extracting the spectrum? I'm looking throught the help patches at the moment and can't quite find a way (and get distracted by great other features ;))

pdelges's icon

Check the mubu-mosaicing.maxpat example in the mubu-pipo folder.
It shows how to apply a mfcc analyse, but should be easy to modify.

woyteg's icon

hi!
This is great, I'll definitely post the results here as I find it really useful to have such a tool when it's finished..
One thing is left to complete this: I can't find the documentation of the pipo modules.. so I don't quite know how to configure the framesize for example.
I guess it's @fft.framesize or something, but do you guys know where i can find all the pipos and their parameters?
thanks again, you were a huge help already!

woyteg's icon
Max Patch
Copy patch and select New From Clipboard in Max.

so, here is a finished abstraction that makes use of the Mubu library.
It needs the mubu object, mubu.track and mubu.process.
This abstraction is, of course, poorly tested. However, I will need this to work reliably, so in case i encounter any bugs, I will update this thread(please tell me if you are using this and have any problems, because again, i will nedd this to work reliably..). This is the abstraction, save it as pl.fileSpectrum.maxpat:

Max Patch
Copy patch and select New From Clipboard in Max.

in order for this test patcher to work:

thanks again to vichug and patrick!

woyteg's icon

I ran into quite some problems with mubu.. any clever native max solutions? I mean there has to be something..
I didn't seriously look into the jitter suggestion i have to admit. I will soon.. but I'm kind of sceptc.. any other ideas?
Thanks!

volker böhm's icon

i'm not quite clear what you want to do.
if you are interested in the spectrum of a whole buffer then you might like to check out vb.FFTWbuf~.
you can find it here http://www.esbasel.ch/software/#vb-objects
it takes one huge fft of the whole buffer, i.e. the freq resolution is directly dependent on the buffer size.
and of course it's not a native solution.

if you are more interested in the average spectrum and want to specify the number of freq bins (fft size) yourself (like in e.g. in audacity), then jit.fft is probably the way to go, although it will be slower.
if you need a kick start, let me know - it's not hard to do.

woyteg's icon

Hi volker!
Thanks for taking the time. First, a theoretical question: what would be the difference between taking the fourier transform of the whole signal vs. the average of STFTs/FFTs? I mean in the result? Of course I would have an order of magnitudes better frequency resolution in the fourier ransform of tthe whole signal(which I don't need) but would that be accumulated values/amplitudes right? SO it would be an average too, right?
Eventually I would need RMS, so that wouldn't be the best soluton I guess.. anyway, maybe I misunderstand something here, thanks for the hint to the external, I'll definitely check it out.
Atthe moment I try to get the jitter version to run.. but since you offered

if you need a kick start, let me know – it’s not hard to do.

Well I can't resist, that would indeed be great!
Thanks!

woyteg's icon
Max Patch
Copy patch and select New From Clipboard in Max.

Well so here is a max/jitter native version for anybody interested in it. For now it does a 2048 bin blackman windowed FFT, using jit.fft.
If you put in a straight sine, you will see errors at about -80dB. I don't know what that is.. missing zero padding? just a bug? normal? hm.. maybe you guys have an idea. Anyway, otherwise it seems to work pretty well. If anybody has problems with it, please let me know..
all the best, and thanks again to volker, patrick, and vichug

and a test patcher, expecting the above to be called pl.fileSpectrum2 and the HISS library spectrogram~ installed(just for the nice griid for the graph, i was lazy configuring plot~)

woyteg's icon
Max Patch
Copy patch and select New From Clipboard in Max.

forgot the patcher:

woyteg's icon

Oh just in case anybody is really using this: this outputs the rms of each bin, over the entire file, and afterwards normalizes the reuslut.

volker böhm's icon

looks good and seems to do what you want. here are a couple of thoughts (although i only had a quick glance):
- jit.buffer~ is pretty much the same as a regular buffer~. so you don't have to fill it separately, just give it the same reference and you are done.
- also no need to copy the data from jit.buffer~ into a jit.matrix: read directly from jit.buffer~ by setting read points with messages "outputfirst"+offsetIntoBuffer and "outputlast"+offsetIntoBuffer+frameSize and follow that by "output". then you get a jit.matrix full of the specified data (float32), with each audio channel on a separate plane.
or did you do that because of this resampling business?
- you can do cartopol with jit.expr [jit.expr @expr sqrt(in[0].p[0]*in[0].p[0]+in[0].p[1]*in[0].p[1])]. and if you want to square the data afterwards because of RMS you can skip the sqrt, which saves some cpu.
- also for RMS calc you could stay in jitterland, i believe.
- if you window the data, you probably should think about overlapping frames. right now your hop size seems to be equal to your fft size.
hope that helps.
all the best,
vb

woyteg's icon

Thanks for taking a look at it!

jit.buffer~ is pretty much the same as a regular buffer~. so you don’t have to fill it separately, just give it the same reference and you are done.

Jit.buffer doesn't seem to output a bang on file read completion. To ensure message ordering (getting infos about file length and sample rate before analysis) i chose this solution. In my final application of this, it doesn't matter if I waste a bit of RAM.

- also no need to copy the data from jit.buffer~ into a jit.matrix: read directly from jit.buffer~ by setting read points with messages "outputfirst"+offsetIntoBuffer and "outputlast"+offsetIntoBuffer+frameSize and follow that by "output". then you get a jit.matrix full of the specified data (float32), with each audio channel on a separate plane.
or did you do that because of this resampling business?

I had a bad time getting all this to work.. I wanted a bit more of a step-by-step approach to be able to debug this precisely. (again, preformance is not my main concern here). But I'm sure you are right.

- you can do cartopol with jit.expr [jit.expr @expr sqrt(in[0].p[0]*in[0].p[0]+in[0].p[1]*in[0].p[1])]. and if you want to square the data afterwards because of RMS you can skip the sqrt, which saves some cpu.
- also for RMS calc you could stay in jitterland, i believe.

This is interesting, I'm going to think about it. But I actually, now that I'm at it, want to implement average and maximum for each bin too, so the sqrt optimization won't be that straight forward. (although again, I think this is quite a good point, these are a lot of possibly unnecessary sqrts)

- if you window the data, you probably should think about overlapping frames. right now your hop size seems to be equal to your fft size.

Now this is something I simply forgot about..! hm, I'll definitely look into this, but overall, thank you again, great to have somebody look at it!
All the best!

woyteg's icon

next version: cleaned up a bit and added some options(average or rms, different frame sizes)
(couldn't seem to be able to paste it here.. too large?? weird.)

pl.fileSpectrum2.maxpat
Max Patch
volker böhm's icon
Max Patch
Copy patch and select New From Clipboard in Max.

ok, fine. just in case someone wants to try this, here is a basic example of how i would do the frame reading for offline processing.
all the best,
vb

loadmess's icon

Hello Maxers,
I started looking at jit.fft and jit.buffer recently and I have a question about the usage of jit.fft.
Jit.fft doesn’t provide any options for window size, hop size, window type (envelope).
I deduced by looking at this forum post that if I wish to perform FFT on my audio sample in jit.buffer, for instance, with a 1024 samples window size, 512 hop size and apply a hanning window,
I have to extract “manually” the audio frame (1024 samples), apply the envelope window, perform the hop size, etc, all “manually” (by patching) through the whole buffer, I’m I correct?
Instead of giving to jit.fft the whole buffer length, which I suppose it's also possible, but depends on the goal.

Nic's icon

Hi all.

I'm trying to adjust phase of an audio sample by an arbitrary amount by using jit.fft, converting cartesian coordinates to polar coordinates and back to cartesian coordinates after adjusting the phase.

I'm able to successfully rotate the phase by 180 degrees, but not by an arbitrary amount in the patch attached.

Do you have any ideas how I can solve this?

Max Patch
Copy patch and select New From Clipboard in Max.

volker böhm's icon

jit.fft gives you the whole picture, the true story so to say, i.e. the upper (mirrored) part is included. A real sinusoid is represented in the FFT by two complex phasors, running in opposite directions. So, in order to shift the phase of all real sinusoids you have to make sure to shift the phase of the upper part of the spectrum in the other direction.

hth, vb

Nic's icon

Thank you very much for your help and swift reply Volker.

You answer helped a lot!

I've split the matrix from jit.fft in two parts and processed the phase of the two parts in opposite direction. See the patch attached.

I'm getting close to successfully achieving the phase rotation I want. However, I have some new questions.

It seems the beginning of the phase rotated buffer is now at the end of my buffer as marked in the picture below. Might this be what is called "pre-ringing"?

Also, when using anton.aif it seems that the "result buffer" cuts off.

Both these effects seems most noticeable when I do a + or - 90 degree shift.

Buffer view of an 808 kick sample rotated by 90 degrees


Do you have any idea to why this is?

Max Patch
Copy patch and select New From Clipboard in Max.

volker böhm's icon

Do you have any idea to why this is?

As far as I remember jit.fft only works on powers of two - for other matrix dimensions it uses some form of interpolation that seem to spoil the phase rotation thing.

Also, I'm not exactly sure what you try to achieve, but thinking about your original question: the FFT assumes the input signal to be periodic. I don't think you can phase rotate an audiofile of arbitrary size by using one huge FFT frame.

Nic's icon

I understand. Thank you very much for you kind help Volker Böhm.

The idea was to find and a "fixed optimal phase rotation" for minimising phase cancellation between two audio samples. Inspired by how Melda's MAutoAlign, Soundradix's Pi and Mastering the Mix's FUSER plugins work.

I think I'll leave it for now, and maybe I'll revisit the idea later.
I might try using your approach of slicing the samples into frames instead of processing the whole sample in one frame.

Roman Thilenius's icon

for what reason do you (all) try to do these things with jit.fft, where it is so easy to do the same with pfft~?

loadmess's icon

Personally, one motive to use jit.fft emerges when one needs to perform some FFT processing offline (in the background), rather than in realtime. For instance, jit.fft allows to compute a signal from a buffer instead of playing the audio in realtime to pfft~.