Spectrum of a buffer/audio file


    Oct 14 2014 | 10:46 pm
    Hi! Any ideas how I could obtain the spectrum of an entire audio file as fast as possible? By the spectrum of the file, I mean the average value of each bin, over the entire file, for example. In order to be faster than realtime, I could imagine upsampling would be an option, or using uzi to retrieve the individual signal vectors as fast as possible..hm.. what do you guys think? Very much appreciating any hints, all the best

    • Oct 15 2014 | 7:47 am
      hm, would not jit.buffer~ make the magic ? this guy made a dissertation with a lot of audio-to-jitter : https://cycling74.com/forums/share-sonographic-sound-processing-diss/ this is overkill in your case, but i think inside this you can find what you wish for, no ?
    • Oct 15 2014 | 7:50 am
      hm i'm not sure of what i'm saying anymore. this guy apparently doesn't use jit.buffer~...
    • Oct 15 2014 | 8:12 am
      Another way may be to use ircam's free MuBu library.
    • Oct 15 2014 | 8:35 am
      still jit.buffer~ seems like a track, wiht jit.fft you could manually slice the output of a jit.buffer~ into spectral frames and...
      edit :indeed MuBu seems a better idea
    • Oct 15 2014 | 12:27 pm
      thanks guys! Both of these are solutions I wouldn't have thought of (and didn't know this library), happy I asked. I gonna look into this.. cheers!
    • Oct 15 2014 | 12:43 pm
      hi again, I just downloaded the library and it really seems *very* impressive. Have you guys any hint how to start with extracting the spectrum? I'm looking throught the help patches at the moment and can't quite find a way (and get distracted by great other features ;))
    • Oct 15 2014 | 1:36 pm
      Check the mubu-mosaicing.maxpat example in the mubu-pipo folder. It shows how to apply a mfcc analyse, but should be easy to modify.
    • Oct 15 2014 | 8:58 pm
      hi! This is great, I'll definitely post the results here as I find it really useful to have such a tool when it's finished.. One thing is left to complete this: I can't find the documentation of the pipo modules.. so I don't quite know how to configure the framesize for example. I guess it's @fft.framesize or something, but do you guys know where i can find all the pipos and their parameters? thanks again, you were a huge help already!
    • Oct 15 2014 | 9:58 pm
      so, here is a finished abstraction that makes use of the Mubu library. It needs the mubu object, mubu.track and mubu.process. This abstraction is, of course, poorly tested. However, I will need this to work reliably, so in case i encounter any bugs, I will update this thread(please tell me if you are using this and have any problems, because again, i will nedd this to work reliably..). This is the abstraction, save it as pl.fileSpectrum.maxpat:
      in order for this test patcher to work:
      thanks again to vichug and patrick!
    • Oct 23 2014 | 6:26 am
      I ran into quite some problems with mubu.. any clever native max solutions? I mean there has to be something.. I didn't seriously look into the jitter suggestion i have to admit. I will soon.. but I'm kind of sceptc.. any other ideas? Thanks!
    • Oct 23 2014 | 11:33 am
      i'm not quite clear what you want to do. if you are interested in the spectrum of a whole buffer then you might like to check out vb.FFTWbuf~. you can find it here http://www.esbasel.ch/software/#vb-objects it takes one huge fft of the whole buffer, i.e. the freq resolution is directly dependent on the buffer size. and of course it's not a native solution.
      if you are more interested in the average spectrum and want to specify the number of freq bins (fft size) yourself (like in e.g. in audacity), then jit.fft is probably the way to go, although it will be slower. if you need a kick start, let me know - it's not hard to do.
    • Oct 23 2014 | 9:27 pm
      Hi volker! Thanks for taking the time. First, a theoretical question: what would be the difference between taking the fourier transform of the whole signal vs. the average of STFTs/FFTs? I mean in the result? Of course I would have an order of magnitudes better frequency resolution in the fourier ransform of tthe whole signal(which I don't need) but would that be accumulated values/amplitudes right? SO it would be an average too, right? Eventually I would need RMS, so that wouldn't be the best soluton I guess.. anyway, maybe I misunderstand something here, thanks for the hint to the external, I'll definitely check it out. Atthe moment I try to get the jitter version to run.. but since you offered
      if you need a kick start, let me know – it’s not hard to do.
      Well I can't resist, that would indeed be great! Thanks!
    • Oct 24 2014 | 2:12 am
      Well so here is a max/jitter native version for anybody interested in it. For now it does a 2048 bin blackman windowed FFT, using jit.fft. If you put in a straight sine, you will see errors at about -80dB. I don't know what that is.. missing zero padding? just a bug? normal? hm.. maybe you guys have an idea. Anyway, otherwise it seems to work pretty well. If anybody has problems with it, please let me know.. all the best, and thanks again to volker, patrick, and vichug
      and a test patcher, expecting the above to be called pl.fileSpectrum2 and the HISS library spectrogram~ installed(just for the nice griid for the graph, i was lazy configuring plot~)
    • Oct 24 2014 | 2:14 am
      forgot the patcher:
    • Oct 24 2014 | 2:19 am
      Oh just in case anybody is really using this: this outputs the rms of each bin, over the entire file, and afterwards normalizes the reuslut.
    • Oct 24 2014 | 3:44 pm
      looks good and seems to do what you want. here are a couple of thoughts (although i only had a quick glance): - jit.buffer~ is pretty much the same as a regular buffer~. so you don't have to fill it separately, just give it the same reference and you are done. - also no need to copy the data from jit.buffer~ into a jit.matrix: read directly from jit.buffer~ by setting read points with messages "outputfirst"+offsetIntoBuffer and "outputlast"+offsetIntoBuffer+frameSize and follow that by "output". then you get a jit.matrix full of the specified data (float32), with each audio channel on a separate plane. or did you do that because of this resampling business? - you can do cartopol with jit.expr [jit.expr @expr sqrt(in[0].p[0]*in[0].p[0]+in[0].p[1]*in[0].p[1])]. and if you want to square the data afterwards because of RMS you can skip the sqrt, which saves some cpu. - also for RMS calc you could stay in jitterland, i believe. - if you window the data, you probably should think about overlapping frames. right now your hop size seems to be equal to your fft size. hope that helps. all the best, vb
    • Oct 24 2014 | 7:16 pm
      Thanks for taking a look at it!
      jit.buffer~ is pretty much the same as a regular buffer~. so you don’t have to fill it separately, just give it the same reference and you are done.
      Jit.buffer doesn't seem to output a bang on file read completion. To ensure message ordering (getting infos about file length and sample rate before analysis) i chose this solution. In my final application of this, it doesn't matter if I waste a bit of RAM.
      - also no need to copy the data from jit.buffer~ into a jit.matrix: read directly from jit.buffer~ by setting read points with messages "outputfirst"+offsetIntoBuffer and "outputlast"+offsetIntoBuffer+frameSize and follow that by "output". then you get a jit.matrix full of the specified data (float32), with each audio channel on a separate plane. or did you do that because of this resampling business?
      I had a bad time getting all this to work.. I wanted a bit more of a step-by-step approach to be able to debug this precisely. (again, preformance is not my main concern here). But I'm sure you are right.
      - you can do cartopol with jit.expr [jit.expr @expr sqrt(in[0].p[0]*in[0].p[0]+in[0].p[1]*in[0].p[1])]. and if you want to square the data afterwards because of RMS you can skip the sqrt, which saves some cpu. - also for RMS calc you could stay in jitterland, i believe.
      This is interesting, I'm going to think about it. But I actually, now that I'm at it, want to implement average and maximum for each bin too, so the sqrt optimization won't be that straight forward. (although again, I think this is quite a good point, these are a lot of possibly unnecessary sqrts)
      - if you window the data, you probably should think about overlapping frames. right now your hop size seems to be equal to your fft size.
      Now this is something I simply forgot about..! hm, I'll definitely look into this, but overall, thank you again, great to have somebody look at it! All the best!
    • Oct 24 2014 | 8:27 pm
      next version: cleaned up a bit and added some options(average or rms, different frame sizes) (couldn't seem to be able to paste it here.. too large?? weird.)
    • Oct 25 2014 | 1:32 pm
      ok, fine. just in case someone wants to try this, here is a basic example of how i would do the frame reading for offline processing. all the best, vb