FFT and IFFT question
if an fft is performed on a peice of audio, then an inverse fft is performed, the original audio is reproduced perfectly, but i've never understood why this is - where in a series of frequency bins is the information about time retained? i would have expected the frequencies detected to be smeared across the entire period being analysed. why doesn't this happen?
and would there be any way to intentionally acheive this smearing, other than having a massive bank of oscillators equal to the number of bins?
Wihin a single frame (time duration) of an FFT, there is indeed no time information, i.e. the frequncy information is smeared across the entire frame. The process is a bit more complicated in order to reflect changes over time.
In essence, a number of short-duration FFTs are taken one after another in sequence and the changes in the amplitude & phase of the frequency bins between each of these frame consitututes the changing information over time. The longer each of the frames, the better the frequency resolution but the more the temporal smearing (i.e. poorer time resolution) and vice versa. Getting a good FFT of a quickly-changing signal is a tradeoff between frequency and temporal accuracy.
The disjunctions across frame edges are handled by 'windowing' each frame to shape the amount and character of the overlap between successive frames - altering the quality in various ways.
I'd reccomend a perusal of any good computer music or digital audio text - Roads, Moore, and Dodge/Jerse are good starting points.
but i'm not doing a partitioned fft - just taking the entire contents of a buffer~ and doing a single very big fft. maybe i should say dft, because the fftw library i'm using (http://www.fftw.org/) doesnt even seem to require that the size is a power of 2... hmm. maybe fftw is doing something internally that i'm unaware of
pete schrieb:
> if an fft is performed on a peice of audio, then an inverse fft is
> performed, the original audio is reproduced perfectly, but i've never
> understood why this is - where in a series of frequency bins is the
> information about time retained?
The short answer is: the information is in the phase, or more exactly in
the difference between the phase value of a specific bin in consecutive
frames...
Stefan
--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com
ah, i suspected it might be something to do with that.. so if i set the phase of everything to 0 before doing the IFFT, then it will be entirely "smeared"? i'll give it a go.
Does that mean that the phase difference is interpolated
when run through the IFFT?
Anthony Palomba schrieb:
> Does that mean that the phase difference is interpolated
> when run through the IFFT?
Think of it as if there are two sine waves with a small difference in
frequency (within the band of a single bin). Then you overlap them with
the window function... That creates something like an interpolation, but
not exactly... ;-)
Stefan
--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com
> Think of it as if there are two sine waves with a small difference in
> frequency (within the band of a single bin). Then you overlap them with
> the window function... That creates something like an interpolation, but
> not exactly... ;-)
hang on, there is no window function in what i'm talking about (i think) just a single fft..
> Wihin a single frame (time duration) of an FFT, there is indeed no
> time information, i.e. the frequncy information is smeared across
> the entire frame.
i wouldn't say this is entirely true.
the time information is in there (otherwise you couldn't transform
the data back without loss),
but you can't "easily see" it.
it's the same as when you try to "see" the frequency information in
a time domain representation.
in both cases all information is there, but you use different kind of
representations to have access to what you are interested in.
one way to get rid of the time information is indeed to zero-out the
phases.
but then you will loose a lot of the frequency information, too.
(phase is a complex animal...)
the resulting sound will always have a harmonic spectrum based on the
delta bin freq of the fft (with an fft size of 1024 this will be
around 43 Hz).
if you are talking about a really large fft/dft, then this could
work, as your frequency bin resolution is very high - resulting in a
very low fundamental.
volker.
> if you are talking about a really large fft/dft, then this could
> work, as your frequency bin resolution is very high - resulting in a
> very low fundamental.
it is a very large fft indeed. the trouble is, removing the phase resulted in a big attack at the start of the IFFT audio result, since all the frequencies were in phase with each other initially. i've also tried randomising the phase of each bin, which works really well, but is still not perfect because it seems that time based information is still there in some way, even if a random way, in the interactions between frequencies - destructive/constructive interference etc.
what i really want is for the resulting sound to remain as static as possible, as if there were a bank of unchanging oscillators as big as the number of frequency bins, with each its amplitude set by the magnitude from the fft. of course, there would still be interference between frequencies here, but not in a "set" repeating pattern of the length of the original audio.
On 09 Jul 2008, at 13:06, pete wrote:
>
>> if you are talking about a really large fft/dft, then this could
>> work, as your frequency bin resolution is very high - resulting in a
>> very low fundamental.
>
> it is a very large fft indeed. the trouble is, removing the phase
> resulted in a big attack at the start of the IFFT audio result,
> since all the frequencies were in phase with each other initially.
> i've also tried randomising the phase of each bin, which works
> really well, but is still not perfect because it seems that time
> based information is still there in some way, even if a random way,
> in the interactions between frequencies - destructive/constructive
> interference etc.
>
> what i really want is for the resulting sound to remain as static
> as possible, as if there were a bank of unchanging oscillators as
> big as the number of frequency bins, with each its amplitude set by
> the magnitude from the fft. of course, there would still be
> interference between frequencies here, but not in a "set" repeating
> pattern of the length of the original audio.
>
yes, randomizing the phases is the better way. i have just tried it
and it sounds very nice.
as i said, when you mess with the phases, you will loose frequency
information, resulting in a purely harmonic sound with a fundamental
corresponding to the size of the fft (number of bins). if the fft is
large you might get a good enough frequency resolution cause the
fundamental is very low. but although the periodicy of the harmonic
sound might not be perceived as an audible pitch, it still repeats
periodically, i.e. you hear it as a repeating rhythm.
is that what you are talking about?
maybe you can upload the original and processed sound somewhere to
compare results.
volker.
Ahh yes, sorry for mis-stating the case. In my haste I conflated frequency and phase.
> yes, randomizing the phases is the better way. i have just tried it
> and it sounds very nice.
>
> as i said, when you mess with the phases, you will loose frequency
> information, resulting in a purely harmonic sound with a fundamental
> corresponding to the size of the fft (number of bins). if the fft is
> large you might get a good enough frequency resolution cause the
> fundamental is very low. but although the periodicy of the harmonic
> sound might not be perceived as an audible pitch, it still repeats
> periodically, i.e. you hear it as a repeating rhythm.
>
> is that what you are talking about?
> maybe you can upload the original and processed sound somewhere to
> compare results.
processed (looped, with overlapped windowing):
http://www.thirdmeaning.net/misc/outputtest.wav
you can hear the "pulsing" which is the length of the file, which you would of course expect, but the question is, is there a way to avoid this, to just generate the frequencies at the right magnitudes infinitely, rather than just creating something the same length as the input and then looping it.
if so then it would still have periodicity as you say, but i think it would be far less obvious, and very large.
>
> original:
> http://www.thirdmeaning.net/misc/guitarscale.wav
>
> processed (looped, with overlapped windowing):
> http://www.thirdmeaning.net/misc/outputtest.wav
>
> you can hear the "pulsing" which is the length of the file, which
> you would of course expect,
the pulsing i hear in your processed sound is much shorter than the
size of the original file.
so i guess there's still something wrong in how you calculate the thing.
compare it to this one:
http://www.esbasel.ch/Downloads/outputtest-vb.wav
this has a period of roughly 3 secs, since the fft routine i use can
only process file lengths that are powers of two.
> but the question is, is there a way to avoid this, to just generate
> the frequencies at the right magnitudes infinitely, rather than
> just creating something the same length as the input and then
> looping it.
>
hm, you would need an oscillatorbank or a realtime version of the
ifft (of the same size as your fft).
then find the peaks and calculate the true frequencies (from phase
increment or magnitudes) etc.
quite expensive for large fft sizes.
if you don't take the infinity too seriously, you could also use an
fft which is even larger than your file, and zero-pad the rest of the
frame.
this file is processed from a frame size of 524288 samples (~12sec.)
http://www.esbasel.ch/Downloads/outputtest-vb2.wav
volker.
pete schrieb:
> you can hear the "pulsing" which is the length of the file, which you
> would of course expect, but the question is, is there a way to avoid
> this, to just generate the frequencies at the right magnitudes
> infinitely, rather than just creating something the same length as
> the input and then looping it.
If you randomise the phase each time you play the complete frame it
should not pulse any more if you have the ideal windowing. But maybe you
need a really good random generator, which is the domain of Peter
Castine's objects...
Stefan
--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com
> the pulsing i hear in your processed sound is much shorter than the
> size of the original file.
> so i guess there's still something wrong in how you calculate the thing.
maybe it's because of my overlapping and adding with a window?
>
> compare it to this one:
> http://www.esbasel.ch/Downloads/outputtest-vb.wav
>
> this has a period of roughly 3 secs, since the fft routine i use can
> only process file lengths that are powers of two.
yes, this sounds much better. how are you looping the result? (wouldnt there be a click if you just looped it straight?)
> hm, you would need an oscillatorbank or a realtime version of the
> ifft (of the same size as your fft).
> then find the peaks and calculate the true frequencies (from phase
> increment or magnitudes) etc.
> quite expensive for large fft sizes.
sounds difficult..
> if you don't take the infinity too seriously, you could also use an
> fft which is even larger than your file, and zero-pad the rest of the
> frame.
this sounds like a very good idea, so the magnitudes would remain unnaffected, except they will all be proportionally lower?..
> this file is processed from a frame size of 524288 samples (~12sec.)
> http://www.esbasel.ch/Downloads/outputtest-vb2.wav
that's really good, just what i'm looking for! thanks for your help, i'm finding this stuff really interesting to play around with.
> If you randomise the phase each time you play the complete frame it
> should not pulse any more if you have the ideal windowing. But maybe you
> need a really good random generator, which is the domain of Peter
> Castine's objects...
one thing i did try was generating 10 (for example) different random versions, then when looping them, randomly choose a different one each time. there was still pulsing, which i guess is because my windowing function isnt right, but could also be because of a poor prng - i'm just using the rand() standard C function
On 09 Jul 2008, at 19:20, pete wrote:
>
> yes, this sounds much better. how are you looping the result?
> (wouldnt there be a click if you just looped it straight?)
no you don't need no windowing, just loop the processed buffer -
that's the cool thing about it. since all the harmonics are integer
multiples of a (very low) fundamental, there can't be clicks.
at least this is true for the power of 2 buffer length i'm using.
don't know about fftw - should be the same though.
>
>> if you don't take the infinity too seriously, you could also use an
>> fft which is even larger than your file, and zero-pad the rest of the
>> frame.
>
> this sounds like a very good idea, so the magnitudes would remain
> unnaffected, except they will all be proportionally lower?..
don't know exactly what you mean by "porportionally lower".
magnitudes will be more or less the same, just more...
zero padding is a common trick in dsp to enhance the frequency
resolution.
with larger fft sizes you get more bins over the same frequency range
(0 -> SR),
i.e. a lower fundamental and thus a longer loop if you transform it
back to time.
>
>> this file is processed from a frame size of 524288 samples (~12sec.)
>> http://www.esbasel.ch/Downloads/outputtest-vb2.wav
>
> that's really good, just what i'm looking for! thanks for your
> help, i'm finding this stuff really interesting to play around with.
fine.
this is a single fft frame transformed back into the time domain.
should be seemlessly "loopable".
volker.
On 09 Jul 2008, at 19:01, Stefan Tiedje wrote:
>
> If you randomise the phase each time you play the complete frame it
> should not pulse any more if you have the ideal windowing.
this would require to calculate the ifft in realtime, which for
_large_ fft sizes is not practical.
v
actually, i'm having trouble replicating your ~12 second version - it comes out like this:
http://www.thirdmeaning.net/misc/outputtest3.wav
as if the phases are lining up a bit at the start and end.
i'm doing this for each bin, does it look right?:
mag = sqrt((fftresult[i][0] * fftresult[i][0]) + (fftresult[i][1] * fftresult[i][1]));
phase = (PI*Random()) - (PI/2.0f);
fftresult[i][0] = mag * (cos(phase));
fftresult[i][1] = mag * (sin(phase));
> mag = sqrt((fftresult[i][0] * fftresult[i][0]) + (fftresult[i][1] * fftresult[i][1]));
> phase = (PI*Random()) - (PI/2.0f);
> fftresult[i][0] = mag * (cos(phase));
> fftresult[i][1] = mag * (sin(phase));
just in case anyone is interested, it turns out the problem with the above is that phase should be in the range -PI to PI, rather than -PI/2 to PI/2 like i was doing for some reason.
This is a great thread, I have always wanted to know more
about the nuts and bolts of FFT/IFFT.
Are you guys doing all this in a Max patch or a source code
external? If there is an example patch, can someone please post it?
> Are you guys doing all this in a Max patch or a source code
> external? If there is an example patch, can someone please post it?
sorry, i'm doing it in an external - i wouldn't know where to begin doing it as a patch. the msp fft objects seem to be geared towards "real-time" frequency analysis, by doing partitioned ffts (short-time fourier transform?), rather than letting you move things around in buffers outside of audio time.. if you see what i mean.
source code is here if you want it:
http://www.thirdmeaning.net/misc/staticresynth~.c
it just needs the fftw library (http://www.fftw.org/), and this random number stuff:
http://www.cs.wm.edu/~va/software/park/
Thanks Peter, I will download fftw and give your source
a try.
if you're on windows then i can just give you the compiled object to save you doing it, i can't compile it for mac though.
A multiple frame approach using FFTease external resent~ can be heard here:
These run in real-time on my 2 GHz MacBook at around 20-25% of CPU.
Volker - I'm curious as to how you generated your very nice examples. Since you used a 524288 FFT size, which exceeds the limits of both pfft~ and fft~, did you write your own external, or use a different program than MaxMSP?
Eric
On 11 Jul 2008, at 17:24, Eric Lyon wrote:
>
> A multiple frame approach using FFTease external resent~ can be
> heard here:
>
> http://tinyurl.com/6jzu63
>
> These run in real-time on my 2 GHz MacBook at around 20-25% of CPU.
resent~ sounds good. have to look into it.
>
> Volker - I'm curious as to how you generated your very nice
> examples. Since you used a 524288 FFT size, which exceeds the
> limits of both pfft~ and fft~, did you write your own external, or
> use a different program than MaxMSP?
ja, fft~/pfft~ won't do that.
i use a custom external (vb.fftbuf) that performs an FFT/iFFT on the
contents of a buffer~, and did the phase randomization in max.
the external uses apple's vDSP lib, so there is no windows version.
http://www.esbasel.ch/Downloads/MaxMSP-Objects.htm
the source is available to anybody who is interested (drop me a mail).
but the external itself is pretty trivial.
BUT, i've just had a look into jit.fft, which seems to be happy with
large fft sizes (524288 doesn't seem to be a problem).
i'll have a go, and see if my poor jitter skills turn up anything
useful...
volker.
Quote: Eric Lyon wrote on Fri, 11 July 2008 09:24
----------------------------------------------------
> A multiple frame approach using FFTease external resent~ can be heard here:
>
> http://tinyurl.com/6jzu63
>
> These run in real-time on my 2 GHz MacBook at around 20-25% of CPU.
>
> Volker - I'm curious as to how you generated your very nice examples. Since you used a 524288 FFT size, which exceeds the limits of both pfft~ and fft~, did you write your own external, or use a different program than MaxMSP?
>
> Eric
>
>
----------------------------------------------------
Hey Eric, I did not know FFTease could do something like
that. In future releases of FFTease, it would be good to
include example transformations, like the ones we are
doing. They go a long way in conveying to people what the
different externals are capable of. Can you post the patch that
you used to create these?
Well shut my mouth! There are in fact sound examples...
http://www.sarc.qub.ac.uk/~elyon/LyonSoftware/MaxMSP/FFTease/
>
> BUT, i've just had a look into jit.fft, which seems to be happy
> with large fft sizes (524288 doesn't seem to be a problem).
> i'll have a go, and see if my poor jitter skills turn up anything
> useful...
>
ok, here is a version that uses jit.fft and kind of works.
seems you have to press "output" twice when one changes the fft size.
at least on my poor ppc laptop.
not so fast...
any "jittery" person tell me, if you find stupid things.
volker.
/* beware of loadbangs */
On 14 Jul 2008, at 09:54, Stefan Tiedje wrote:
>
> For Max 5 I had to use a size message to a buffer~ object to get
> the output buffer set. Don't know why...
thanks, stephan.
no max5 here to test. but i guess it's a bug introduced in 5.
volker.
On 15 juil. 08, at 19:04, Stefan Tiedje wrote:
> In another thread it was mentioned, that the sizeinsamps message
> doesn't work if you don't specify a size of the jit.buffer~ on
> instatiation. Seems a known issue, easy to work around...
known and already fixed for the next incremental. In the meantime, you
can still use the size message.
Thanks for your patiente.
ej