julien breval

I would like to know which techniques exist for encoding the output of a A-Format microphone (eg see the Soundfield SPS200 or the Core Sound TetraMic) to usual B-Format (WXYZ).

Fons Adriaensen from Parma has developped an interesting encoding software adapted to the Core Sound microphone:

http://www.kokkinizita.net/papers/tetraproc.pdf

During the encoding, at two moments it uses a FFT-based convolution (see the paper for more details). I wonder which impact these FFT operations have on the recorded signal. Usually, the FFT (especially in realtime) involves a time versus frequency trade-off that can blur sharp attacks. Perhaps there are some special techniques for avoiding it, though I have not heard yet about them.

Core Sound themselves and Soundfield have made such encoders for using with their microphones but they don't give any information about it. Moreover, I didn't find any downloadable B-Format recording of a drums solo or a xylophone, which would show the effect that A- to B-Format encoding has on sharp attacks.

Therefore I am looking for links about it (books, scientific papers or web documents).

Fons Adriaensen from Parma has developped an interesting encoding software adapted to the Core Sound microphone:
http://www.kokkinizita.net/papers/tetraproc.pdf
During the encoding, at two moments it uses a FFT-based convolution (see the paper for more details). I wonder which impact these FFT operations have on the recorded signal. Usually, the FFT (especially in realtime) involves a time versus frequency trade-off that can blur sharp attacks. Perhaps there are some special techniques for avoiding it, though I have not heard yet about them.

ambisonics-methods-for-encoding-a-format-to-b-format

My understanding is that the convolution is used to calibrate the signals from the capsules before matrixing, ensuring that the derived B-format is accurate. Check out this article by Angelo Farina.

http://pcfarina.eng.unipr.it/Public/B-format/A2B-conversion/A2B.htm

The people on the sursound list could answer your questions much better than I ever could.

https://mail.music.vt.edu/mailman/listinfo/sursound

Hi Julien,
My understanding is that the convolution is used to calibrate the signals from the capsules before matrixing, ensuring that the derived B-format is accurate. Check out this article by Angelo Farina.
http://pcfarina.eng.unipr.it/Public/B-format/A2B-conversion/A2B.htm
The people on the sursound list could answer your questions much better than I ever could.
https://mail.music.vt.edu/mailman/listinfo/sursound
cheers,
Jim


A-format to B-format convesion does not require FFT, it's just a linear combination of signals.

Sorry I dont have time to dig out the equations but they are quite well known, look at papers from Gerzon, also check 

A-format to B-format convesion does not require FFT, it's just a linear combination of signals.
Sorry I dont have time to dig out the equations but they are quite well known, look at papers from Gerzon, also check http://blog.soundsorange.net/?cat=8  for DIY Soundfield mikes.
Cheers.


From what I understand there is some fairly complex filtering that 

is required to make an accurate conversion.

On Nov 8, 2007, at 5:01 PM, Gui Pot wrote:

> A-format to B-format convesion does not require FFT, it's just a 

> Sorry I dont have time to dig out the equations but they are quite 

> well known, look at papers from Gerzon, also check 

> blog.soundsorange.net/?cat=8 for DIY Soundfield mikes.

San Francisco, CA    Work: 857-544-3967

 From what I understand there is some fairly complex filtering that  
is required to make an accurate conversion.

>
> Hi,
>
> A-format to B-format convesion does not require FFT, it's just a  
> linear combination of signals.
> Sorry I dont have time to dig out the equations but they are quite  
> well known, look at papers from Gerzon, also check http:// 
> blog.soundsorange.net/?cat=8  for DIY Soundfield mikes.
> Cheers.
>
>

San Francisco, CA	Work: 857-544-3967
Email: bthrew@gmail.com
IM: captogreadmore (AIM)
http:/www.barrythrew.com


A-format is the four signals from the capsules - left-front/back and 

(Rumsey,F., 'Spatial Audio', Focal Press, pp 113-114)

> I would like to know which techniques exist for encoding the output

> of a A-Format microphone (eg see the Soundfield SPS200 or the Core

> Sound TetraMic) to usual B-Format (WXYZ).

A-format is the four signals from the capsules - left-front/back and 
right-front/back. To get B-format:

X = 0.5 ((LF - LB) + (RF - RB))
Y = 0.5 ((LF - RB) - (RF - LB))
Z = 0.5 ((LF - LB) + (RB - RF))
W = 0.5 (LF + LB + RF + RB)

julienbreval wrote:
> Hello,
>
> I would like to know which techniques exist for encoding the output
> of a A-Format microphone (eg see the Soundfield SPS200 or the Core
> Sound TetraMic) to usual B-Format (WXYZ).

The linear combination works but only for frequencies below 5000 Hz or such. Over, the microphone array can't be considered as coincident since the wavelenghts are much smaller than the distance between the capsules. We have "Spatial Audio" (a great book) at the research centre and I will check this but it looks like a coarse approximation.

In his original tetrahedron microphone patent (that you can find on the Internet), Gerzon explains that some complex filtering is required at least after the matrixing (linear combinations) and he gives the phase and frequency responses of the filters for W and X (or Y or Z). Perhaps this filter can be implemented comletely in the time domain, with a series of biquadratic filters for example, but I would have to ask a specialist about this. It's also possible to implement this filter by means of FFT-based convolution, which is more straightforward but requires using (unwanted) FFTs.

Farina has another approach, that consist in measuring the impulse response of a room in B-Format with the ambisonic microphone, and using it for implementing the required filtering by means of a convolution. This method is interesting because it also compensates the capsules misalignement in actual 4-capsule microphones.

Sorry if I got completey wrong (my current understanding of the problem makes me think that I should rather work with a double-M/S system :)

Farina has another approach, that consist in measuring the impulse response of a room in B-Format with the ambisonic microphone, and using it for implementing the required filtering by means of a convolution. This method is interesting because it also compensates the capsules misalignement in actual 4-capsule microphones. 

Sorry if I got completey wrong (my current understanding of the problem makes me think that I should rather work with a double-M/S system :)


> Usually, the FFT (especially in realtime) involves a time versus

> frequency trade-off that can blur sharp attacks. Perhaps there are

> some special techniques for avoiding it, though I have not heard yet

This is not true as long you don't process your bins. You can do the 

test yourself, and just send audio through a pfft~ which does nothing. 

If you do processing, you have to know exactly what has to happen with 

the phase part of the signal to avoid that blur...

There is no quality difference between realtime and non realtime, but 

you get a pretty big latency (the framesize) if you want high frequency 

julienbreval schrieb:
> Usually, the FFT (especially in realtime) involves a time versus
> frequency trade-off that can blur sharp attacks. Perhaps there are
> some special techniques for avoiding it, though I have not heard yet
> about them.

This is not true as long you don't process your bins. You can do the 
test yourself, and just send audio through a pfft~ which does nothing. 
The result will be the same...

If you do processing, you have to know exactly what has to happen with 
the phase part of the signal to avoid that blur...

There is no quality difference between realtime and non realtime, but 
you get a pretty big latency (the framesize) if you want high frequency 
resolution...

-- 
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

Quote: Stefan Tiedje wrote on Fri, 09 November 2007 22:58

----------------------------------------------------

> This is not true as long you don't process your bins. You can do the 

> test yourself, and just send audio through a pfft~ which does nothing. 

> If you do processing, you have to know exactly what has to happen with 

> the phase part of the signal to avoid that blur...

> There is no quality difference between realtime and non realtime, but 

> you get a pretty big latency (the framesize) if you want high frequency 

So a large "framesize" does some (time-)averaging that blurs any sharp attack. Overlapping lots of FFT windows won't change anything (it should result in a sum of averagings), though it has to be tested seriously. Conversely, using small windows keeps the sharp attacks in a better way but has a poor frequency resolution because there can't be any bass sounds as the window is too short.

Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap. 

Please correct me if I missed some important points.

Quote: Stefan Tiedje wrote on Fri, 09 November 2007 22:58
----------------------------------------------------
> This is not true as long you don't process your bins. You can do the 
> test yourself, and just send audio through a pfft~ which does nothing. 
> The result will be the same...
>
> If you do processing, you have to know exactly what has to happen with 
> the phase part of the signal to avoid that blur...
>
> There is no quality difference between realtime and non realtime, but 
> you get a pretty big latency (the framesize) if you want high frequency 
> resolution...
----------------------------------------------------

Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap. 
Please correct me if I missed some important points.


You also need to make sure your window / overlap combo doesn't 

introduce distortion along the way. This is often overlooked. 

Summing windows with the chosen overlap should result in a constant 

magnitude; if it doesn't, you'll introduce AM distortion.

On Nov 9, 2007, at 1:58 PM, Stefan Tiedje wrote:

>> Usually, the FFT (especially in realtime) involves a time versus

>> frequency trade-off that can blur sharp attacks. Perhaps there are

>> some special techniques for avoiding it, though I have not heard yet

> This is not true as long you don't process your bins. You can do 

> the test yourself, and just send audio through a pfft~ which does 

> nothing. The result will be the same...

> If you do processing, you have to know exactly what has to happen 

> with the phase part of the signal to avoid that blur...

> There is no quality difference between realtime and non realtime, 

> but you get a pretty big latency (the framesize) if you want high 

You also need to make sure your window / overlap combo doesn't  
introduce distortion along the way.  This is often overlooked.   
Summing windows with the chosen overlap should result in a constant  
magnitude; if it doesn't, you'll introduce AM distortion.

> julienbreval schrieb:
>> Usually, the FFT (especially in realtime) involves a time versus
>> frequency trade-off that can blur sharp attacks. Perhaps there are
>> some special techniques for avoiding it, though I have not heard yet
>> about them.
>
> This is not true as long you don't process your bins. You can do  
> the test yourself, and just send audio through a pfft~ which does  
> nothing. The result will be the same...
>
> If you do processing, you have to know exactly what has to happen  
> with the phase part of the signal to avoid that blur...
>
> There is no quality difference between realtime and non realtime,  
> but you get a pretty big latency (the framesize) if you want high  
> frequency resolution...
>
> Stefan
>
> -- 
> Stefan Tiedje------------x-------
> --_____-----------|--------------
> --(_|_ ----|-----|-----()-------
> -- _|_)----|-----()--------------
> ----------()--------www.ccmix.com
>
>

> Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap. 

> Please correct me if I missed some important points.

The mathematics of fft/ifft do reproduce the exact same values if you 

don't change anything inbetween. (Assuming an overlap of 2 and a hanning 

window.) There is no blur and no averaging. Listen to it...

The explanation is within the information about the phase...

A Fourier transform of infinte length, is exactly the same as the time 

representation. (That's fouriers original well prooved statement.) As we 

limit our timeresolution with our sample rate, we can skip some 

information. You can look at a fft frame like a grain of granular 

synthesis. If you don't change the pitch, the overlapping of the grains 

reproduces the original signal exactly...

julienbreval schrieb:
> Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap. 
> Please correct me if I missed some important points.

The mathematics of fft/ifft do reproduce the exact same values if you 
don't change anything inbetween. (Assuming an overlap of 2 and a hanning 
window.) There is no blur and no averaging. Listen to it...
The explanation is within the information about the phase...

A Fourier transform of infinte length, is exactly the same as the time 
representation. (That's fouriers original well prooved statement.) As we 
limit our timeresolution with our sample rate, we can skip some 
information. You can look at a fft frame like a grain of granular 
synthesis. If you don't change the pitch, the overlapping of the grains 
reproduces the original signal exactly...

(when I say "at least in the DAW", it's because the discrete Fourier Transform encodes only N frequencies and phases, N being the FFT resolution; there are also some aliasing problems referred to as 'leakage')

Quote: julien breval wrote on Tue, 13 November 2007 13:17

> After this, using the FFT during a recording (like A- to B-format encoding) should be OK (as the latency is not important and because the convolution is not an approximation in digital audio if the FFT parameters are OK), especially with Farina's calibration method. 

> As each room is different, and as the microphone position can vary, one should normally do the calibration before each recording, which is not possible in everyday's practice. So the current approach looks like using as many impulse responses as possible, which can help emulating the acoustics of various different spaces. The only problem I see is that when recording in *your* non-anechoic room, the acoustics of your room are combined to the ones of the room which the impulse response comes from so it must add some "error" to the encoding. Then I have no idea whether this error is significative or not (I don't have such microphone so I can't do any test. I imagine the spatial information should be more affeced, if any, than the "omnidirectionnal" sum spectra of the sounds).

Actually, after reading Farina's paper again, I was completely wrong. Really sorry as it introduces some misinformation. Here is a sum up of the correct version.

The IR measurement described by Farina should be performed in an anechoic room, though it's possible to record in a big space too as you can later remove any reflection in a DAW software.

1. Early calibration of each of the 4 capsules, by comparing each capsule to a reference microphone placed in the same direction as the capsule. This allow to get four matched capsules before matrixing.

2. Linear matrixing (see the 4 equations above in the thread)

2. Late calibration of the sound field microphone, by comparing the four matrixed signals to three different positions of the reference microphone (along the X, Y and Z cartesian directions for the X, Y and Z signals; there is no precision about W but it is perhaps done similarly, using a matched omnidirectionnal reference microphone).

The eight required filters are applied to the signal by means of a FFT convolution.

Quote: julien breval wrote on Tue, 13 November 2007 13:17
----------------------------------------------------
> After this, using the FFT during a recording (like A- to B-format encoding) should be OK (as the latency is not important and because the convolution is not an approximation in digital audio if the FFT parameters are OK), especially with Farina's calibration method. 
> As each room is different, and as the microphone position can vary, one should normally do the calibration before each recording, which is not possible in everyday's practice. So the current approach looks like using as many impulse responses as possible, which can help emulating the acoustics of various different spaces. The only problem I see is that when recording in *your* non-anechoic room, the acoustics of your room are combined to the ones of the room which the impulse response comes from so it must add some "error" to the encoding. Then I have no idea whether this error is significative or not (I don't have such microphone so I can't do any test. I imagine the spatial information should be more affeced, if any, than the "omnidirectionnal" sum spectra of the sounds).
>
----------------------------------------------------

1. Early calibration of each of the 4 capsules, by comparing each capsule to a reference microphone placed in the same direction as the capsule. This allow to get four matched capsules before matrixing.
2. Linear matrixing (see the 4 equations above in the thread)
2. Late calibration of the sound field microphone, by comparing the four matrixed signals to three different positions of the reference microphone (along the X, Y and Z cartesian directions for the X, Y and Z signals; there is no precision about W but it is perhaps done similarly, using a matched omnidirectionnal reference microphone).

The eight required filters are applied to the signal by means of a FFT convolution.


most probably the same omnidirectional reference microphone can be used for the 4 measures of part 3.

no, actually you move the position of the source (impulse) relatively to the position of the microphones ? not the relative position of the sound field microphone and the measure microphone (a measure microphone is always omnidirectional anyway :)

ps: how can one edit or delete posts in this forum ???????????

> Yes, actually there is no change (besides a huge phase offset) 

> between a signal and the same signal encoded with FFT and then 

> decoded back to time domain. I did a recording test and compared both

> waveforms in a DAW. As soon as the phase offset is compensated, the 

> sample values of both recordings are the same (at least in the DAW).

> As the FFT is symmetrical at the Nyquist frequency, I chose a FFT

> size of 8192 samples, with an overlap factor of 2 and Hanning windows

> (if you use a bigger overlap factor for reducing the phase offset,

> for example in realtime use, you have to choose another Window

> shape). I would be interested in knowing which is the optimal FFT

> size for 44100 Hz though (at 8192, how many frequencies are

Please go back and test it with an FFT size of 64 (Yes you read correct)

I bet the signal will still be the same, but the latency will be only

If you take such a small FFT size, you can't do too much with it, as

each bin will cover a huge frequency range... (All frequencies are 

represented, its still a continuum of frequencies...)

If you have done that, you could even try a frame size of 16 (the

minimum) it will devide your frequency into 16 equaly spaced bins. With

a sampling rate of 48 kHz the first bin is going from DC to 1.5 kHz. And

it will contain some information of the neighbour bin as well...

I never prooved it myself by ear, but you could do that easily. I am 

> ps: how can one edit or delete posts in this forum ???????????

Not wanted, we love the whole history of failure to let newbies feel

better... (we all struggle with the same problems... ;-)

julienbreval schrieb:
> Yes, actually there is no change (besides a huge phase offset) 
> between a signal and the same signal encoded with FFT and then 
> decoded back to time domain. I did a recording test and compared both
>  waveforms in a DAW. As soon as the phase offset is compensated, the 
> sample values of both recordings are the same (at least in the DAW).

> As the FFT is symmetrical at the Nyquist frequency, I chose a FFT
> size of 8192 samples, with an overlap factor of 2 and Hanning windows
> (if you use a bigger overlap factor for reducing the phase offset,
> for example in realtime use, you have to choose another Window
> shape). I would be interested in knowing which is the optimal FFT
> size for 44100 Hz though (at 8192, how many frequencies are
> repreasented inside the FFT ?)

Please go back and test it with an FFT size of 64 (Yes you read correct)
I bet the signal will still be the same, but the latency will be only
some ms...

If you take such a small FFT size, you can't do too much with it, as
each bin will cover a huge frequency range... (All frequencies are 
represented, its still a continuum of frequencies...)

If you have done that, you could even try a frame size of 16 (the
minimum) it will devide your frequency into 16 equaly spaced bins. With
a sampling rate of 48 kHz the first bin is going from DC to 1.5 kHz. And
it will contain some information of the neighbour bin as well...

I never prooved it myself by ear, but you could do that easily. I am 
curious about the result...

julienbreval schrieb:
> ps: how can one edit or delete posts in this forum ???????????

Not wanted, we love the whole history of failure to let newbies feel
better... (we all struggle with the same problems... ;-)

Quote: Stefan Tiedje wrote on Wed, 14 November 2007 17:17

> Please go back and test it with an FFT size of 64 (Yes you read correct)

> I bet the signal will still be the same, but the latency will be only

The only problem I see is that some low frequencies may be misanalysed. The FFT size (resolution, or number of frequency bands in the analysis) and the analysis window size seem to be closely related. I am not sure whether (FFT size) = (window size) or (FFT size) = 2 x (window size), because of this symmetry at the Nyquist frequency. In any case, in an analysis window of 64 samples the lowest possible frequency is about 689 Hz (for a sr of 44100 Hz).

Then, probably the best solution for reducing latency is overlaping the analysis windows.

Quote: Stefan Tiedje wrote on Wed, 14 November 2007 17:17
----------------------------------------------------
> Please go back and test it with an FFT size of 64 (Yes you read correct)
> I bet the signal will still be the same, but the latency will be only
> some ms...
----------------------------------------------------

The only problem I see is that some low frequencies may be misanalysed. The FFT size (resolution, or number of frequency bands in the analysis) and the analysis window size seem to be closely related. I am not sure whether (FFT size) = (window size) or (FFT size) = 2 x (window size), because of this symmetry at the Nyquist frequency. In any case, in an analysis window of 64 samples the lowest possible frequency is about 689 Hz (for a sr of 44100 Hz). 

Then, probably the best solution for reducing latency is overlaping the analysis windows.


Your test patch is interesting. Actually I got similar results in the DAW. In most cases, the error can be neglected (at least if there is NO transform in the spectral domain).

Coming back to the original problem of this thread, it would now be interesting to investigate how "realistic" (I know this is not the right word) a FFT convolution is (for example, when convolving a signal with a room IR response, for implementing some kind of reverb, or for balancing two microphones using a cross-IR measure technique). I have not practised any DFT mathematics since 2002 or 2003, so I am not anymore into it. I don't have the time to test it in practice tonight, but the problem remains (at least for me): "what is the minimum FFT size for the result of a convolution between two signals is perceptually satisfactory ?" Maybe any size can fit, but I (wild-)guess the result is all the more better than the FFT size is big ... At this point I feel like needing to find a good source about DFT mathematics (perhaps I still have one).

> Coming back to the original problem of this thread, it would now be

> interesting to investigate how "realistic" (I know this is not the

> right word) a FFT convolution is (for example, when convolving a

> signal with a room IR response, for implementing some kind of reverb,

> or for balancing two microphones using a cross-IR measure technique).

Ok, thats a different story. A brute force convolution made with a 

mathematical mind, would need a size of the length of the impulse 

response. That's why the convolution reverb industry is protecting their 

different way's of dealing with this problem with tons of patents...

A simple filter will have a short impuls response, but a real space 

could have several seconds of impuls response. This would lead to very 

julienbreval schrieb:
> Coming back to the original problem of this thread, it would now be
> interesting to investigate how "realistic" (I know this is not the
> right word) a FFT convolution is (for example, when convolving a
> signal with a room IR response, for implementing some kind of reverb,
> or for balancing two microphones using a cross-IR measure technique).

Ok, thats a different story. A brute force convolution made with a 
mathematical mind, would need a size of the length of the impulse 
response. That's why the convolution reverb industry is protecting their 
different way's of dealing with this problem with tons of patents...

A simple filter will have a short impuls response, but a real space 
could have several seconds of impuls response. This would lead to very 
long frame sizes... - tricky...

For calibrating microphone capsules, it's not really a problem though, even in realtime. Angelo Farina explains that you it's possible to do the measures in a sufficiently big room, so that the recorded reflections are well separated from the direct (dry) impulse. Then, in a DAW software, it's easy to remove any reflection. This finally gives a IR that is less than, say, 8192 samples, so that you can easily process the required realtime convolutions.

Please refer to this page for more details:

Please refer to this page for more details:
http://pcfarina.eng.unipr.it/Public/B-format/A2B-conversion/A2B.htm


(Ambisonics) methods for encoding A-Format to B-Format