pfft~ documentation // hop size explanation unclear

David Meyer's icon

It seems to me that the explanation of hop size in the pfft~ Reference under Overlap Factor is a bit unclear, or at least to me.

Under Overlap factor, it says :

The hop size (number of samples between each successive FFT window) of Fast Fourier transforms performed is equal to the size of the Fast Fourier transform divided by this overlap factor. (e.g. if the frame size is 512 and the overlap is set to 4 then the hop size is 128 samples).

Under fftsize right before that, it says:

Specifies the FFT size, in samples, of the overlapped windows which are transformed to and from the spectral domain by the FFT/IFFT. The window size must be a power of 2, and defaults to 512. (Note: The size of the spectral "frames" processed by the pfft~ object's subpatch will be half this size, as the 2nd half of the spectrum is a mirror of the first and thus redundant, unless the full-spectrum-flag is present.)

First, it says that the hop size is the fft size divided by the overlap factor, but then in parentheses, it says that the frame size is to be divided by the overlap factor. But the frame size is only half the fft size (as stated in the second quote).

Assuming, the fft size is 4096 samples, then the size of one spectral frame is 2048 samples. Overlap factor is 4. What would be the hop size? 1024 or 512?

I have always understood the hop size to refer to the FFT size, not the spectral frame size.

jg's icon

Four years later, and I've come up with the same question! Can anyone shed light on this, please?

Blue Wall Loop's icon

TL:DR - FFTsize is how many samples your process, Hop Size is how much you move across. You want to have a Hop size smaller then your FFTsize so you don't get artifacting

I'm not an expert in FFT so feel free to correct anything I may get wrong in this:

The FT typically has a O(N^2) complexity which lead to the development of the FFT (Fast Fouier Transfer) which has a O(N log N). FFT uses a resuersive devide which is why a power of 2 is prefered when using FFT.

Three parts typically matter for FFT

  1. Sample Rate (Fs)

  2. Time Frame (N)

  3. Hop Size (H)

The Time Frame is what I believe is refered to as fftsize and is the "Window" of the samples you are processing at one step.
The Hop Size is then the value you step across onces processed. You want to have a hop size smaller then your Time Frame so you can get a better understanding of the frequeny over time information. This is why the Hop Size will typically be 4 or in other words you will have to hop four steps to have move the full distance of your Time Frame once.

You could set your Time Frame and Hop Size to be the same size but you would have a less accurate understanding of your frequency time.

There was also a small mention of the fullspecture which is typically devided by two when processed. This is becuase it will also produce the negative frequency part which is not needed as it's a direct copy of the first half. (Doesn't effect your Time Frame or Hop Size)

Hope this helps
Blue

jg's icon

Thanks, Blue! But the meanings of the terms FFT size and hop size weren't the OP's question. The problem is that the help files are ambiguous as to how large a hop size you get as a result of the arguments you set.

First, it states that the 'frame size' is half the size of the FFT size.

Then, it says that "The hop size... is equal to the size of the Fast Fourier transform divided by this overlap factor. (e.g. if the frame size is 512 and the overlap is set to 4 then the hop size is 128 samples)."

So is the hop size the FFT size divided by the overlap factor, or the frame size divided by the overlap factor? For instance, if you give pfft~ the arguments [pfft~ patchername 1024 4], then FFT size is 1024, which means that the frame size is 512 (according to the help file). We then divide either the FFT size or the frame size by 4, which is either 256 or 128.

I'm assuming that the material about the frame size is just poorly expressed, and the hop size is, in fact, FFT size / overlap factor, but it'd be nice to know for sure!

Best,

J

Blue Wall Loop's icon

Hi J,

From looking back at what you said plus the OP. I relised I may have been a bit complecated then needed, I was more trying to explain how the FFT works so those that are confused maybe able to understand the process but like you said having the correct definition would be usefull!

I think the confusion comes from the use of "frames" where they use it to describe the output bins which you half (due to being the negative frequencies of the real part you have processed). This gets confused with the "frames" of samples that the FFT size is.

FFT size will be the total frames/samples you process at once [pfft~ 1024 4]. In this example it will be 1024. What ever the output values are (the 1024 frequency Bins) will be halved due to the reason talked about above. Therefor the hop size will be calcualted at 256 because we take 1024 / 4. pfft~ handles the frequency bins out which will give you 512 bins (+ the Nyquist bin)

Below are my edits to the docs that may make it easier to understand but take the first with a pinch of salt as I'm not fully sure the objects processes all the samples of the FFT and halfs the output or halves the FFT-size and then keeps the full bins. I believe it's the first because if you only use the halve samples you will get less frequencies out due to the same reason we half them.

FFT-size[int]samples

optional

Specifies the FFT size, in samples, of the overlapped windows which are transformed to and from the spectral domain by the FFT/IFFT. The window size must be a power of 2, and defaults to 512. The minimum value is 32, unless legacy is enabled in which case it is 16. The maximum value is 1048576. (Note: The total samples processed will be equal to the FFT-size but the total bins out will be half this size... The size of the spectral "frames" processed by the pfft~ object's subpatch will be half this size, as the 2nd half of the spectrum is a mirror of the first and thus redundant, unless the full-spectrum-flag is present.)

overlap-factor (hop-size-denominator)[int]

optional

The third argument determines the overlap factor for FFT analysis and resynthesis windows. The hop size (number of samples between each successive FFT window) of Fast Fourier transforms performed is equal to the size of the Fast Fourier transform (FFT-size) divided by this overlap factor. (e.g. if the frame size is 512 and the overlap is set to 4 then the hop size is 128 samples). The value must be a power of 2 and defaults to 2. A value of 4 is recommended for most applications.

Let me know if this better helps with the understanding of the docs and use?

All the best,
Blue

jg's icon

Hi Blue,

Many thanks indeed - this helps a lot. Your edits deal with the confusion nicely - I hope they get implemented in the official documentation!

Best,

John