Understanding Impulse Response
Hi all,
I'm trying to get a better understanding of how impulse response measurements work and how they are applied to convolution reverbs.
Here's my understanding currently:
Measure IR of a system
x(t) * H(t) = y(t)
In this case, x(t) is a log sweep that played to the system and also looped back as a Reference Signal. y(t) is the recorded output from the system.
By Laplace Transform
x(s) h(s) = y(s)
h(s) = y(s) / x(s)
In Max, this is the equivalent of using fft on the recorded buffers (ie. x(t) and y(t)) to extract magnitude and phase values for each fft bin. Then, divide the numerical magnitude and phase values of y(t) by that of x(t). Write these calculated magnitude and phase values into a new buffer, which in effect becomes h(t). Store h(t) as a .wav file.
Convolve IR as reverb
x(t) * h(t) = y(t)
This time, x(t) is the audio track and h(t) is the buffer derived previously. y(t) is the post-convolution reverb sound that is heard.
Similar to the previous step, run a fft on x(t) and h(t), but this time multiply their magnitude and phase values. Write these values into a new buffer and playback this buffer.
Is my understanding correct...?
i dont speak math, but the process is done to *each* of the input samples and not only to the stream.
where *each* means you had to do it with a scrolling buffer of the input stream which is always as long as the IR file you currently use.
What is "the process" in this case?
i think you could say ... more or less ... FIR.
first send samples 1 - n though a [buffir~] object, then 2 - n+1, ... should more or less be a rudimentary version of what happens.
but it would be silly to impossible to implement it like that for a realtime application.
It's an FIR in the time domain but it's super expensive that way. That's why they turn it to FFT first.
http://www.dspguide.com/ch18/2.htm
I believe the way to do it is to split the IR into vector sized chunks, do the convolution on each chunk in the FFT domain and then add them all together.
There was a time I did this by hand at school but I forget exactly how to do it now.
yeah they use chunks but also different kind of compression or rate reduction for realtime.
it is rocket science, we should not build it in max.
well, maybe next decade. ;)
We should not build it in max!?!?!!!??
theres already externals for it. And it comes in m4l
Joel, convolution is a time-domain operation. Indeed, a convolution reverb applies the impulse response to each sample in the input stream. It's interesting to look at 2d versions of convolution: for instance, this web page allows you to play with the convolution kernel: https://setosa.io/ev/image-kernels/
In audio, the equivalent is 1d, and the kernel you play with is the impulse response.
So, you can experiment and compute a convolution reverb completely in time domain.
Then, you can also switch momentarily to Fourier domain to compute the convolution more easily (a convolution in time domain is a multiplication in Fourier domain).
Maybe you already had a look, but the HISSTools are available in the Package manager.
that there is already an external is not the main reason, you can still build stuff yourself and sometimes it is even better. *)
the problem is more that one second of true strereo IR reverb would require you to run 164,000 or 384,000 parallel instances of buffir, and there is no CPU yet which could do that.
you can do that for an amp sim with 64 samples or so, but not with reverb.
*)
of course before you create something better than, say, multiconvolve~ of the hiss toolbox, you should recreate it with vanilla objects, i.e. using pfft~.
this is the only patch i know, it offers a max of 0,1 seconds and you can forget it for realtime.
http://www.pescadoo.net/annexe/max/
Great discussion and thanks for the linked resources, all.
The resource that AudioMatt shared seems to indicate a very similar approach to the mathematical process I described. But I'm really struggling to wrap my head around the implementation difference between a "static" (as in no variation in the time domain) FIR filter vs. a convolution reverb (which has varying mag and phase over time)...
--> I suppose theoretically you could map Mag and Phase vs Time for each fft bin and then apply that to an incoming audio stream as sort of a "time-varying FIR"? Would this constitute a conv reverb...?
Also, could you clarify why parallel processing is needed?
--> My understanding is the convolution reverb "resolution" is limited by the incoming buffer window size. For example, using a relatively small buffer like 128 samples will yield very poor frequency resolution, and therefore reverb granularity is compromised. Conversely, increasing the buffer size will yield better resolution at the expense of increased latency. Processing multiple buffer segments in parallel can improve both resolution and latency - however, this is only essential for real-time applications, but not as critical for say post-processing in a DAW.
"is limited by the incoming buffer window size."
the resolution, or better, the quality, depends on the sampling rate and bitrate resolution (of both sides)
"For example, using a relatively small buffer like 128 samples will yield very poor frequency resolution"
yes but no.
the length of the IR file first of all is equal to the length of the response.
the reverb will only be 128 samples long if you use only 128 samples (and could hardly be called a reverb anymore)
and no matter how you program it, the longer the reverb tail is, the more CPU will be used to calculate it onto a track.
A typical convolution reverb uses a "static" finite impulse response: the reverb is a "static" characteristic of the given space, you usually use a single impulse response for it (even if this single impulse response is built from different recordings). If you work at, say, 44.1kHz, and want a reverb with a 6 second decay, your impulse response will be more than 250,000 samples.
I'm still not getting this...
Having a very long IR buffer only gives higher frequency resolution (ie. fft bins become smaller) - it doesn't have any impact on time-variant response. For example, if I analyzed this 6s buffer, I would get one Mag and Phase value for each fft bin. Multiplying this Mag and Phase value with an incoming audio signal constitutes a FIR filter, not convolution reverb. Reverb requires a time-variant response for each fft bin.
The only way I can think to make it work would be to divide that 6s buffer into, for example, 1s segments and then cascade these segments sequentially. In other words, first capture a 1s buffer of the incoming audio signal, and then convolve it with each of those broken up IR buffers sequentially. This would give a time-variant response. Even then, the time domain resolution would be very poor (ie. 1s resolution) which wouldn't result in a natural-sounding reverb...
unfortunately i am not able to explain it properly, but what you describe is not identical with my practical experience.
if you record an impulse answer which is 6 seconds long and you preprocess the file to make it ready for an IR reverb, the output will be 6 seconds long, too, and it will - suprise - produce a 6 seconds long reverb when you use it.
it is the FIR filter where (only) the quality rises the more samples you use, not deconvolution.
i wonder which reverb you´ve used which does it like you descibe? dont get confused by software which is able to timescale the IR file after loading it.
"1s segments and then cascade these segments sequentially"
buffir~ has a 512 samples limit. so you need 87 filters per second of IR lenght.
or actually 174, because you always need 2 in parallel and then do windowing.
that is why it is so hard to imagine how to do it in max: because it is almost impossible. :)