Block-based processing inside pfft~


    Dec 02 2019 | 9:50 am
    Dear all,
    I'm thinking about developing a series of low-latency fft-based spectral processing tools (pitch-shifting, frequency shifting, spectral warping). This requires being able to address the whole fft blocks at once, meaning not fft bin per fft bin, which does not allow (as far as I know) for instance to shift frequency bins down without the cost of an extra latency of one block size (overall latency of two block sizes instead of one).
    As far as I know, the only way to do that is with an external, like for instance gizmo~ does, since gen~ is like traditional Max patching sample-based processing and not block-based, meaning there is no way I know in gen to fetch the input data from the whole incoming block, only the last samples.
    Am I right?
    Best
    Alexis

    • Dec 02 2019 | 3:14 pm
      Dear Alexis,
      I am also only aware of being able to process sample/bin-wise data within the [pfft~] object. Did you see you could use [capture~] to accumulate a buffer? But I guess this doubles your desired processing time (two frames). Did you look at the [fft~] object too? It atleast gives you a sync signal as soon as one block is passed through.
      One workaround might be to rephrase your algorithm to work recursively for each sample instead of requiring entire blocks - if that is possible.
      Keeping an eye on this. This is a very interesting question and might lead to: Not natively possible and requires the user to compile an external...
    • Dec 02 2019 | 6:46 pm
      Dear Webe,
      thanks for your answer. Indeed, the problem with storing the fft info in a buffer~ or using capture~ is the extra delay of one block, and this for one reason: otherwise it's impossible to use the high-frequency input to create some low-frequency output (since the high-frequency input is not yet there when we need to produce the low-frequency output). A simple example: shifting the frequencies down with 1000 Hz requires, when one needs to produce for instance the 500 Hz information for fftout~, the knowledge of the 1500 Hz information coming at fftin~.
      This is why I think that block-based processing is the only solution to overcome this, since then all fft bins are processed at once with an overview of all bins coming at the input.
      Best
      Alexis
    • Dec 05 2019 | 4:58 pm
      Unfortunately there are no MSP capabilities to process blocks of audio at once in that way. Since both pfft~ and fft~ output their spectra sample by sample, you will always end up with an extra block of delay as you accumulate the spectrum back into a buffer~ (or gen~ data or whatever) for processing en-masse. I guess if you are using large FFT sizes in a real-time performance where latency is critical this would be an issue.
      The only way I can imagine avoiding the extra block of delay is to do the block-based processing in Jitter, via jit.fft, as this really should give you access to the spectrum as a block of data as soon as it is ready.
      Graham
    • Dec 06 2019 | 7:57 am
      Well, the attached below is about the best I can do in this vein. Definitely jit.catch~/jit.release~ is more useful here, and opens up a realm of Jitter-based block processing of audio signals that I at least hadn't considered before. Some quite interesting possibilities to explore.
      For your purposes, I wonder if this helps or not. With SIAI and jit.catch~, this is all happening in the audio thread between block boundaries, so in theory there is no latency added in the Jitter processing; but there is latency added for getting the audio back & forth between MSP and Jitter. For a 512 sample FFT, it would only be 512 samples of jit.catch~, 256 samples due to overlapping FFTs (I did 2x overlap in this patcher), and the approximately 256 samples of latency added to jit.release~, which was the lowest I could get it before the audio started glitching. This might be slightly better than what pfft~ and buffer~ processing would get you; but it demands SIAI and may glitch out if there's a lot of processing going on elsewhere.
      Graham
    • Dec 06 2019 | 8:27 am
      Dear Graham, thanks for your answers. I wouldn't have thought that a solution through Jitter would be that reliable for audio processing, this is very interesting. A latency of 2 blocksizes is still too much for my purposes (I'm trying to get under 10ms). pfft~ allows an overall latency of (Blocksize - Vector size) for overlaps of 2 and 4, which is much closer to what I'm looking for.
      I will investigate that further.
      Best
      alexis
    • Dec 06 2019 | 11:13 am
      Hi Alexis,
      I have been thinking about something like you describe for quite some time now. According to my research, in order to achieve low-latency FFT block processing, the only way that makes sense is to write externals in C.
      Once we established that, there is an important design decision to be made. Are you only thinking about a set of externals designed to be used inside a pfft~ to do FFT processing the Max/MSP way, or would you also consider externals that manage internally the FFT processing parameters? (FFT size, overlap, windowing, etc...) Going the Max/MSP way obviously is easier and integrates better into the environment. However one of the things I have been missing in Max is to be able to interact with the FFT analysis parameters in real-time. (FFT size, hop size, window types, zero-padding, etc...) This is not currently possible using pfft~. Last but not least, various opportunities for parallelization come to mind. A 16x overlap FFT processing could be spread across many CPU or even GPU cores. I am pretty sure that pfft~ does not use the GPU, but is pfft~ able to parallelize its processing load across CPU cores?
      Any further thoughts?
    • Dec 06 2019 | 6:09 pm
      Hi Alexis,
      I did some latency measuring in the modified patcher below. First, play some kind of pure tone through it and wind down the metro/latency duration as low as it can safely go before it glitches. For me, it works even at 4ms, lower than that and I get drop outs in the output (and higher than 11ms I get dropouts in the input).
      Then switch the selector~ to chose the click~ input, and the gen~ patcher will measure the latency of the click from input to output. The roundtrip latency seems to depend on Max's IOVS (vector size); at 64 samples the latency is under 20ms (sometimes as low as 12ms); at 512 samples the latency is 30ms. This is at 44.1khz sampling rate. (At 48kHz I can get 11ms with 256 iovs.) It might also depend on some of Max's scheduler settings.
      This is for a 512-point FFT. At 44.1kHz there is a minimum latency of 12ms just to get a 512-point FFT spectrum no matter what technique is used. So the Jitter route is adding less than 8ms to this minimum necessary latency.
      At least some of this is due to the [delay~] objects in there, which I used as a hack for quickly getting 2x overlap. But it should be possible to do it without using delay, but rather two jit.catch~/jit.release~ objects that are triggered alternately. Potentially this might shave up to 6ms off the roundtrip latency. I'm just not sure if I can figure out how to get them to do it. The reason jit.catch~ works is that it is driven by the metro which is on the high priority thread.
      That said, you could also probably achieve all this using plain fft~, buffer~ and gen~ objects, which would be easier than writing an external.
    • Dec 06 2019 | 7:28 pm
      @Graham: your patch is a very interesting way to prototype: contrary to pfft~ it allow a perfect reconstruction of the input signal if no transformation is applied, at least with a rectangular window. With a triangle window, perfect reconstruction does not work any more, which is normal since there is no normalization like Griffin-Lim, which should lead (I have to double-check) to a modulated output. With your patch, IOVS set to 128, I get a latency of 1024 samples, not bad at all. The problem I encoutered are unstabilities and crashes, so I would exclude it for a live performance at least until I better understand how to overcome them
      @Luigi: if I had to start writing an external I'd probably go directly for an imbedded fft using FFTW, for the reasons you mention, and also to be able to control the way the time signal is being reconstructed, since even if pfft has the advantage of simplicity and low latency (the latency actually equals block size minus vector size) I didn't find a way to ensure perfect reconstruction if the STFT frames are not modified, even with a square window.
      Best
      Alexis
    • Dec 06 2019 | 8:24 pm
      Yes, I see instabilities too -- it's definitely pushing Max in a direction it wasn't exactly designed for, so not that surprising. I had a different version of this working really solidly, then suddenly it crashed and the crash recovery didn't recover the gen~ code; now I can't seem to rebuild it, ugh.
      Proably it doesn't matter. At least for low-latency, pfft~ with 8x overlap is already pretty amazing; the overlap cancels out a lot of the latency incurred by buffering up spectra before processing.
      For a 512 FFT the roundtrip latency is 448 samples; with a buffer~ based processing internal to it, and with 8x overlap, it only grows to 512 samples of roundtrip latency, 11.6ms at 44.1kHz, or 10.7ms at 48khz. That's pretty good. That is, using buffer~ based processing can add less than 2ms if you are using overlap.
      Be careful to make the buffers have unique names though...