Cem Guney

Would appreciate if someone could explain the following from the msp tutorial:

Notice also that because we will be applying the amplitude envelope twice (once before the fft~ and once again after the ifft~), we take the square root of the envelope values, so we do not have unwanted amplitude modulation resulting from our envelopes (we want the overlapping envelopes to crossfade evenly and always add up to 1).

...i dont understand how taking the the sqrt of the amp. env. could prevent amplitude modulation and

how the envelopes would crossfade evenly and always add up to 1.

Would appreciate if someone could explain the following from the msp tutorial:
Notice also that because we will be applying the amplitude envelope twice (once before the fft~ and once again after the ifft~), we take the square root of the envelope values, so we do not have unwanted amplitude modulation resulting from our envelopes (we want the overlapping envelopes to crossfade evenly and always add up to 1). 

...i dont understand how taking the the sqrt of the amp. env. could prevent amplitude modulation and
how the envelopes would crossfade evenly and always add up to 1.

max-6-tutorial-25-using-the-fft-taking-the-sqrt-of-the-envelope-values

they are talking about a triangular amplitude envelope... they mean a window, in this particular case a Bartlett window. The Bartlett (or triangular) window gives perfect reconstruction with an overlap of 50%.

Perfect reconstruction means that when the windowed time slices are finally summed together all resulting samples add to a constant (1 in this case).

That's why they perform two 512 samples long FFTs, one of which is offset 256 samples later than the other. That's the 50% overlap (they also call it hopsize) that will give perfect reconstruction.

You can easily see that in the nice drawing they put right above the text snippet in question.

Now, applying the window before the fft~ and after the ifft~ equals applying the window twice to the same signal.

Applying the window twice to the same signal equals squaring the window and applying it to the signal.

Now if you square a triangular window you no longer get a triangular window, AND obviously you no longer get perfect reconstruction with 50% overlap. If we don't have perfect reconstruction it means that summing together the windowed time slices in some parts we'll get samples of different amplitudes than in other parts. In other words we get amplitude modulation artifacts.

Therefore to compensate for this problem, they use a nifty trick: they take the square root of the window. When the window is squared the square root cancels out and we get back our original triangular window, which still gives perfect reconstruction with a 50% overlap.

it's actually pretty simple...
Let's try to contextualize it:

they are talking about a triangular amplitude envelope... they mean a window, in this particular case a Bartlett window. The Bartlett (or triangular) window gives perfect reconstruction with an overlap of 50%.
Perfect reconstruction means that when the windowed time slices are finally summed together all resulting samples add to a constant (1 in this case).
That's why they perform two 512 samples long FFTs, one of which is offset 256 samples later than the other. That's the 50% overlap (they also call it hopsize) that will give perfect reconstruction.
You can easily see that in the nice drawing they put right above the text snippet in question.

Now, applying the window before the fft~ and after the ifft~ equals applying the window twice to the same signal.
Applying the window twice to the same signal equals squaring the window and applying it to the signal.
Now if you square a triangular window you no longer get a triangular window, AND obviously you no longer get perfect reconstruction with 50% overlap. If we don't have perfect reconstruction it means that summing together the windowed time slices in some parts we'll get samples of different amplitudes than in other parts. In other words we get amplitude modulation artifacts.
Therefore to compensate for this problem, they use a nifty trick: they take the square root of the window. When the window is squared the square root cancels out and we get back our original triangular window, which still gives perfect reconstruction with a 50% overlap.

firstly, pleasure to meet you! and thx much for your friendly response & explanation.

...explanations in first paragraph understood except: "Perfect reconstruction means that when the windowed time slices are finally summed together all resulting samples add to a constant (1 in this case)"...do you mean 1, as in the proper analysis of one fft frame as a result of the overlap or is the constant 1 as a result of the analysis of all frames in 1000 ms.

* Before i posted my question on the forum i'd been studying FFT from Curtis Roads, Computer Music Tutorial and i didnt understand what he meant by the perfect summation criterion.

explanations in the 2nd. paragraph understood, nifty trick indeed, great!

by the way i thought it be a great opportunity to get your thoughts on a subject i came across the web in relation to the

http://music.columbia.edu/cmc/musicandcomputers/chapter5/05_04.php):

By changing the length of the overlap when we resynthesize the signal, we can change the speed of the sound without affecting its frequency content (that is, the FFT information will remain the same, it’ll just be resynthesized at a "larger" frame size). That’s how the phase vocoder typically changes the length of a sound.

reading this info completely mixed up my mind because what i understood from the patch(phase_vocoder/tut.26) is that theres no such

process that changes the lenght of the overlap, as it's fixed to "4x" and basically the time-stretching/compression is achieved via

adjusting the playback rate which plays the frames either slower or faster...

sorry for the rather long question but was wondering what i'm missing here.

lastly, thought you might be interested in listening to a work of mine with a friend i made,

http://www.cronicaelectronica.org/?p=cronicaster

i'm the guy on the front...hope you like it!,

and would be great to listen to any of your works that you'd like to share.

firstly, pleasure to meet you! and thx much for your friendly response & explanation. 

by the way i thought it be a great opportunity to get your thoughts on a subject i came across the web in relation to the
phase vocoder [max tut_26]. it says at: http://music.columbia.edu/cmc/musicandcomputers/chapter5/05_04.php): 

reading this info completely mixed up my mind because what i understood from the patch(phase_vocoder/tut.26) is that theres no such
process that changes the lenght of the overlap, as it's fixed to "4x" and basically the time-stretching/compression is achieved via
adjusting the playback rate which plays the frames either slower or faster...

sorry for the rather long question but was wondering what i'm missing here. 

lastly, thought you might be interested in listening to a work of mine with a friend i made,
http://www.cronicaelectronica.org/?p=cronicaster
i'm the guy on the front...hope you like it!, 

and would be great to listen to any of your works that you'd like to share. 

(1) ok, to understand perfect reconstruction check out this site:

http://www.katjaas.nl/FFTwindow/FFTwindow.html

Look at the 7th picture from the top to understand exactly what I mean.

In the example he is using a Hann window, but the theory behind it is the same and it applies in the same fashion to the triangular window as well.

(2) the phase vocoder article you pointed out is not very good.

It's full of phase vocoder material online that's better than that.

I would recommend you study the phase vocoder from somewhere else.

The overlap they talk about in the article has nothing to do with the FFT overlap.

You are right: the FFT overlap doesn't change and in your specific example it stays fixed at 4x.

My guess is that the overlap they are talking about is the playback speed overlap.

The easy explanation is that if you want to shrink a sound without changing the pitch you need to playback the time slices with a certain overlap. This means that when you are done playing back a slice you have to play back again a portion of the same slice in addition to a portion of the next slice. That will have the effect of going through the sound file at a slower rate but the rate of playback of each individual slice will always be the original rate. This is accomplished by playing back the FFT slices with a certain overlap. In other words you are repeating portions of slices when you playback the file but the actual playback speed of the slices does not change from the original. Of course the more overlap the slower you will be playing back the soundfile. This will give you the effect of paying back the file at a slower speed keeping the pitch unaltered.

Of course in the real phase vocoder it is not as easy as that.

But my explanation will hopefully clear up the overall concept.

If you really want to understand the phase vocoder I recommend you implement one in Max/MSP.

https://cycling74.com/2006/11/02/the-phase-vocoder-–-part-i/

You will learn more than in any book or article you may read.

Sorry, I am in a rush right now. I am working on a project and we are close to a deadline.

I did listen to your work, but I didn't do it with the attention and with the time it deserves.

After reading your explanation of the piece I was mostly fascinated by the thought process behind it.

You seem like a very thoughtful and creative guy. Keep up the good work.

Maybe one day we'll meet in person at a Cycling'74 Expo or something...

(1) ok, to understand perfect reconstruction check out this site:
http://www.katjaas.nl/FFTwindow/FFTwindow.html
Look at the 7th picture from the top to understand exactly what I mean.
In the example he is using a Hann window, but the theory behind it is the same and it applies in the same fashion to the triangular window as well.

(2) the phase vocoder article you pointed out is not very good.
It's full of phase vocoder material online that's better than that.
I would recommend you study the phase vocoder from somewhere else.

The overlap they talk about in the article has nothing to do with the FFT overlap.
You are right: the FFT overlap doesn't change and in your specific example it stays fixed at 4x.
My guess is that the overlap they are talking about is the playback speed overlap.
The easy explanation is that if you want to shrink a sound without changing the pitch you need to playback the time slices with a certain overlap. This means that when you are done playing back a slice you have to play back again a portion of the same slice in addition to a portion of the next slice. That will have the effect of going through the sound file at a slower rate but the rate of playback of each individual slice will always be the original rate. This is accomplished by playing back the FFT slices with a certain overlap. In other words you are repeating portions of slices when you playback the file but the actual playback speed of the slices does not change from the original. Of course the more overlap the slower you will be playing back the soundfile. This will give you the effect of paying back the file at a slower speed keeping the pitch unaltered.

Of course in the real phase vocoder it is not as easy as that.
But my explanation will hopefully clear up the overall concept.

If you really want to understand the phase vocoder I recommend you implement one in Max/MSP.
Here is a starting point for you:
https://cycling74.com/2006/11/02/the-phase-vocoder-–-part-i/

Sorry, I am in a rush right now. I am working on a project and we are close to a deadline.
I did listen to your work, but I didn't do it with the attention and with the time it deserves.

On a first listen, it seems interesting.
After reading your explanation of the piece I was mostly fascinated by the thought process behind it.
You seem like a very thoughtful and creative guy. Keep up the good work.
Maybe one day we'll meet in person at a Cycling'74 Expo or something...

Exactly. I think this drawing explains perfect reconstruction better than any words could ever do. Thank you.

...yes, great illustration [thx very much "t" for the post!], as well as the one from katja's, understood!.

works out well with a 2x overlap as in the illustration but i assume things would get messy with a 4x overlap/hopsize: 256.

then for ex: the first crossing point would be a value that exceeds 1, which could sum to a value say, 1.6...then what happens,

quote[c. roads]: any additive or multiplicative transformations that disturb the perfect summation criterion at the final stage of the OA cause side effects that will probably be audible. Time expansion by stretching the distance between windows, for example, may introduce comb filter or reverberation effects, depending on the number of frequency channels or bins used in the analysis. Using speech or singing as a source, many transformations result in robotic, ringing voices of limited use.

...i assume it's possible to make a connection to what you explained previously in regards to the length of the FFT overlap such as,

the STFT does not actually stretch the distance between windows, what actually increases the distance is the amount of overlaps as

you have to repeat portions of slices, but when you do that it will disturb the perfect summation criterion and this could be termed the

result of an "additive transformation", so have have i understood correctly?

as to the link, i really appreciate it!, i also came across it and for sure like you stated above, it needs a lot of work

to fully understand whats going on with a phase vocoder so will be working on that tutorial.

lastly thx much for your nice words & feedback and glad you found the work interesting. would also like to find

out what kind of work your doing, so please send me a link sometime if it's ok.

...all the best with your project, thx for the help, cheers!

works out well with a 2x overlap as in the illustration but i assume things would get messy with a 4x overlap/hopsize: 256.
then for ex: the first crossing point would be a value that exceeds 1, which could sum to a value say, 1.6...then what happens,
what to do?, :)

quote[c. roads]:  any additive or multiplicative transformations that disturb the perfect summation criterion at the final stage of the OA cause side effects that will probably be audible. Time expansion by stretching the distance between windows, for example, may introduce comb filter or reverberation effects, depending on the number of frequency channels or bins used in the analysis. Using speech or singing as a source, many transformations result in robotic, ringing voices of limited use.

...i assume it's possible to make a connection to what you explained previously in regards to the length of the FFT overlap such as,
the STFT does not actually stretch the distance between windows, what actually increases the distance is the amount of overlaps as
you have to repeat portions of slices, but when you do that it will disturb the perfect summation criterion and this could be termed the
result of an "additive transformation", so have have i understood correctly?

as to the link, i really appreciate it!, i also came across it and for sure like you stated above, it needs a lot of work
to fully understand whats going on with a phase vocoder so will be working on that tutorial. 

lastly thx much for your nice words & feedback and glad you found the work interesting. would also like to find
out what kind of work your doing, so please send me a link sometime if it's ok.

...all the best with your project, thx for the help, cheers!


great thread, thought to drop my question too:

if i don't do ifft, only fft to analize the sound i need to multiply with a window function, it's ok... but not sure about overlapping... should i do overlapping then too?

what if i do the the two ffts at 0 and 256 (on a 512 sample basis) ? Should I add the complex numbers at certain bins? Should i calculate the magnitude from the complex numbers for both fft windows and sum them? What about the phase?

in the example the overlapping ffts dont interact just add again after the ifft, what's ok, just i want to build a spectrum where i don't need ifft.

if i don't do ifft, only fft to analize the sound i need to multiply with a window function, it's ok... but not sure about overlapping... should i do overlapping then too? 

firstly, it be would be great if you would specify what you want to do in terms of building a spectrum,

is it for a spectrum analyzer?...note: i'm a newcomer with max and and just started to learn FFT but

theres many experienced users that can help you much better than me.

In any case, i hope a recap of what i know in relation to your questions would be of help,

. OVERLAP: Overlapping helps to visualize the spectrum accurately. Theres a give and take between frequency resolution and

temporal information. The larger the FFT frame the better the freq. resolution but then you attain less data of when the

frequencies occured. So if you want to know more accurately of when the frequencies occured you have to limit the

analysis to short segments(windows must also overlap in order that you capture the signal without gaps). in your case i

assume you'd need an overlap factor of 2x or 4x. Since you dont want to transform/modulate the spectrum i think you

wouldnt want too much of an overlap either as that would also oversample the spectrum.

For example, a percussive sound would necessitate an analysis with at least four overlaps, while a reasonably static, harmonically rich sound would call for a very large FFT size.

. The choice of frame size depends on what you need to do. As to your question (Should i add complex numbers at certain bins):

I'm not sure what you mean but I dont think you need to add any complex numbers at certain bins because your not going to

modify the signal. As to the remaining questions: I think it depends on the application and how you need to use the fft~ or pfft~ objects.

...you can attain magnitude and phase from cartopol~ as it gives you the calcution that is derived from the fft~.

thats all i can say for now & i hope it's been helpfull.

firstly, it be would be great if you would specify what you want to do in terms of building a spectrum,
is it for a spectrum analyzer?...note: i'm a newcomer with max and and just started to learn FFT but
theres many experienced users that can help you much better than me.  

In any case, i hope a recap of what i know in relation to your questions would be of help, 

. OVERLAP: Overlapping helps to visualize the spectrum accurately. Theres a give and take between frequency resolution and
temporal information. The larger the FFT frame the better the freq. resolution but then you attain less data of when the
frequencies occured. So if you want to know more accurately of when the frequencies occured you have to limit the
analysis to short segments(windows must also overlap in order that you capture the signal without gaps). in your case i
assume you'd need an overlap factor of 2x or 4x. Since you dont want to transform/modulate the spectrum i think you
wouldnt want too much of an overlap either as that would also oversample the spectrum. 

. The choice of frame size depends on what you need to do. As to your question (Should i add complex numbers at certain bins):
I'm not sure what you mean but I dont think you need to add any complex numbers at certain bins because your not going to
modify the signal. As to the remaining questions: I think it depends on the application and how you need to use the fft~ or pfft~ objects.
...you can attain magnitude and phase from cartopol~ as it gives you the calcution that is derived from the fft~. 

thats all i can say for now & i hope it's been helpfull. 

thanx Cem, sorry if i wasn't clear, but i understand how fft and windowing works....

what is not clear, if i should do with the overlapped windowed fft outputs, should i add the complex numbers they outputs to get one signal flow to analyse?

it's not clear how to handle the 2 FFT outputs, how i can make one signal flow (the flow of the complex numbers they output) out of the 2 fft outputs.

i don't need to do ifft, and the max tutorial don't do anything with them, just send them into to different IFFTs for resynthesis and sums them after...

So if i want to do a normal spectrum analiser only (not spectral REsynthesis), should i use overlapping windows or is it enough one windowed FFT for this purpose?

thanx Cem, sorry if i wasn't clear, but i understand how fft and windowing works.... 

what is not clear, if i should do with the overlapped windowed fft outputs, should i add the complex numbers they outputs to get one signal flow to analyse? 

it's not clear how to handle the 2 FFT outputs, how i can make one signal flow (the flow of the complex numbers they output) out of the 2 fft outputs. 

So if i want to do a normal spectrum analiser only (not spectral REsynthesis), should i use overlapping windows or is it enough one windowed FFT for this purpose?


in your case i assume you would need to use overlapping windows because as i mentioned you will also want a good

enough time resolution along with the frequency resolution > you'd want to visualize the spectrum accurately.

...the awnser to your other questions could possibly be related similarly to the processing of a phase vocoder, in this

case you subtract the phases of the two fft outputs to attain a true frequency and then translate these phase

differences into phase values via frameaccum~, so you get a one signal flow...by the end since you dont want to

resynthesize the signal you wont send it via fftout~.

...would be great to get some feedback from somenone interested in this topic, that would like to share his/her knowledge, cheers!

in your case i assume you would need to use overlapping windows because as i mentioned you will also want a good
enough time resolution along with the frequency resolution > you'd want to visualize the spectrum accurately. 

...the awnser to your other questions could possibly be related similarly to the processing of a phase vocoder, in this
case you subtract the phases of the two fft outputs to attain a true frequency and then translate these phase
differences into phase values via frameaccum~, so you get a one signal flow...by the end since you dont want to
resynthesize the signal you wont send it via fftout~.

...would be great to get some feedback from somenone interested in this topic, that would like to share his/her knowledge, cheers!


Max 6, Tutorial 25: Using the FFT > Taking the sqrt of the envelope values...?