http://cycling74.com/2006/11/02/the-phase-vocoder-%E2%80%93-part-i/

or just a picture of a pfft∼ subpatcher:

http://img543.imageshack.us/img543/9639/pfftsubpatch.png

1. What is the role of the actual phase vocoder part in the pfft∼ subpatch? It seems to me that there is a synchronous granular synthesis going on in the pfft∼ and that the FFT is just resynthesizing the sound (for nothing). I know I am missing something important here but I just can’t understand what is going on. I tried to move the time domain part of the pfft∼ subpatch into the main patch and erase the pfft∼. The result was a “SGS” with no overlap or overlap 2 (if I manually set the hop size for the “previous” window which is otherwise set by fftinfo∼). Where/how does some “proper” overlap happen because the sound is evidently smooth?

2. The sampling rate in the pfft∼ subpatch is 4 times bigger with overlap 4 if I understand that right (compared to the mother patch). Does this create the 4×2=8 overlap? *2 because we are reading 2 windows at the same time…?

3. Why is there a need for frameaccum∼ in this patch? I think I am really confused with the running phase and the frameaccum∼ object in general. As far as I understand, the phase difference is calculated between two equivalent bins in successive frames. So if you jump from frame 1 to frame 10, the difference is calculated between frame 10 and 9 instead of 1. Is that not enough to calculate the frequency? Why frameaccum∼? According to the formulas found in the article “A Tutorial on Spectral Sound Processing Using Max/MSP and Jitter” by Jean-Francois Charles this data should be enough:

“center frequency fc (Hz) of the frequency bin m is

fc = m × (sr/FFTSize)

assuming no more than one frequency is present in each frequency bin in the analyzed signal, its value in Hz can be expressed as

f = fc + “delta”φ × (sr/(2π × WindowSize))

where “delta”φ is the phase difference, wrapped within the range [–π, π].”

I am sorry for such a long post but I am evidently very confused about some essential basics….thanks for any answer!

]]>Quite subtle tutorial, of course, and interesting questions!

1. The [fft~] object does the “analysis” part of the phase vocoder. The “re-synthesis” part, or “inverse FFT” is processed by the output of the [pfft~], through the [fftout~] door. The windowing can remind you of granular synthesis, but here, it is really the windowing function added before the FFT process. This windowing is kind-of hidden when you use the inputs of [pfft~] (i.e. [fftin~]) to go from time to spectral domain.

2. Overlap 4 means that, in a way, you are processing 4 windows at the same time. I’m not sure I understand your question, sorry.

3. In case you would like to know where the “one frequency in the frequency bin” is, you would use the formula (actually, it’s just proportionality). But here, what you want to do is re-synthesize the sound. And what the inverse-FFT engine wants is a x & a y (cartesian coordinates). You will get them by giving the polar coordinates to [poltocar~]. But to do that, you need a phase value, not a phase difference. That’s why you use [frameaccum~], to translate these phase differences back into phases, usable by [poltocar~].

Hope that helps a little. ]]>

You take the differences of the phases and then move between frames at a different rate, and hence the differences sum to produce a set of running phases that *are not the same* as the input (unless the speed is 1, in which case this is what we want). This makes the result quite different from SGS.

This running phase / phase difference business:

1 – is where all the problems of the phase vocoder start.

2 – is the essential difference between a kind of SGS and the phase vocoder – in theory the phase vocoder sounds smooth in a way that SGS will not, because we are taking into account phase in the reconstruction of (and phase is a relative measure – so it’s the differences that are important) – by continuing each bin according to its phase changes over time we hope to achieve something that SGS can’t.

Maybe that makes things a little clearer?

A.

]]>