(Ambisonics) methods for encoding A-Format to B-Format

julien breval's icon

Hello,

I would like to know which techniques exist for encoding the output of a A-Format microphone (eg see the Soundfield SPS200 or the Core Sound TetraMic) to usual B-Format (WXYZ).

Fons Adriaensen from Parma has developped an interesting encoding software adapted to the Core Sound microphone:
http://www.kokkinizita.net/papers/tetraproc.pdf
During the encoding, at two moments it uses a FFT-based convolution (see the paper for more details). I wonder which impact these FFT operations have on the recorded signal. Usually, the FFT (especially in realtime) involves a time versus frequency trade-off that can blur sharp attacks. Perhaps there are some special techniques for avoiding it, though I have not heard yet about them.

Core Sound themselves and Soundfield have made such encoders for using with their microphones but they don't give any information about it. Moreover, I didn't find any downloadable B-Format recording of a drums solo or a xylophone, which would show the effect that A- to B-Format encoding has on sharp attacks.

Therefore I am looking for links about it (books, scientific papers or web documents).

Regards,
-j

voxish's icon

Hi Julien,
My understanding is that the convolution is used to calibrate the signals from the capsules before matrixing, ensuring that the derived B-format is accurate. Check out this article by Angelo Farina.
http://pcfarina.eng.unipr.it/Public/B-format/A2B-conversion/A2B.htm
The people on the sursound list could answer your questions much better than I ever could.
https://mail.music.vt.edu/mailman/listinfo/sursound
cheers,
Jim

nanonash's icon

Hi,

A-format to B-format convesion does not require FFT, it's just a linear combination of signals.
Sorry I dont have time to dig out the equations but they are quite well known, look at papers from Gerzon, also check http://blog.soundsorange.net/?cat=8 for DIY Soundfield mikes.
Cheers.

barry threw's icon

I don't believe it is strictly linear.

From what I understand there is some fairly complex filtering that
is required to make an accurate conversion.

b

On Nov 8, 2007, at 5:01 PM, Gui Pot wrote:

>
> Hi,
>
> A-format to B-format convesion does not require FFT, it's just a
> linear combination of signals.
> Sorry I dont have time to dig out the equations but they are quite
> well known, look at papers from Gerzon, also check http://
> blog.soundsorange.net/?cat=8 for DIY Soundfield mikes.
> Cheers.
>
>

Barry Threw
Media Art and Technology

San Francisco, CA    Work: 857-544-3967
Email: bthrew@gmail.com
IM: captogreadmore (AIM)
http:/www.barrythrew.com

Owen Green's icon

Hi Julien,

You don't need FFTs.

A-format is the four signals from the capsules - left-front/back and
right-front/back. To get B-format:

X = 0.5 ((LF - LB) + (RF - RB))
Y = 0.5 ((LF - RB) - (RF - LB))
Z = 0.5 ((LF - LB) + (RB - RF))
W = 0.5 (LF + LB + RF + RB)

(Rumsey,F., 'Spatial Audio', Focal Press, pp 113-114)

--
O

julienbreval wrote:
> Hello,
>
> I would like to know which techniques exist for encoding the output
> of a A-Format microphone (eg see the Soundfield SPS200 or the Core
> Sound TetraMic) to usual B-Format (WXYZ).

julien breval's icon

Hello,

The linear combination works but only for frequencies below 5000 Hz or such. Over, the microphone array can't be considered as coincident since the wavelenghts are much smaller than the distance between the capsules. We have "Spatial Audio" (a great book) at the research centre and I will check this but it looks like a coarse approximation.

In his original tetrahedron microphone patent (that you can find on the Internet), Gerzon explains that some complex filtering is required at least after the matrixing (linear combinations) and he gives the phase and frequency responses of the filters for W and X (or Y or Z). Perhaps this filter can be implemented comletely in the time domain, with a series of biquadratic filters for example, but I would have to ask a specialist about this. It's also possible to implement this filter by means of FFT-based convolution, which is more straightforward but requires using (unwanted) FFTs.

Farina has another approach, that consist in measuring the impulse response of a room in B-Format with the ambisonic microphone, and using it for implementing the required filtering by means of a convolution. This method is interesting because it also compensates the capsules misalignement in actual 4-capsule microphones.

Sorry if I got completey wrong (my current understanding of the problem makes me think that I should rather work with a double-M/S system :)

Stefan Tiedje's icon

julienbreval schrieb:
> Usually, the FFT (especially in realtime) involves a time versus
> frequency trade-off that can blur sharp attacks. Perhaps there are
> some special techniques for avoiding it, though I have not heard yet
> about them.

This is not true as long you don't process your bins. You can do the
test yourself, and just send audio through a pfft~ which does nothing.
The result will be the same...

If you do processing, you have to know exactly what has to happen with
the phase part of the signal to avoid that blur...

There is no quality difference between realtime and non realtime, but
you get a pretty big latency (the framesize) if you want high frequency
resolution...

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

julien breval's icon

Quote: Stefan Tiedje wrote on Fri, 09 November 2007 22:58
----------------------------------------------------
> This is not true as long you don't process your bins. You can do the
> test yourself, and just send audio through a pfft~ which does nothing.
> The result will be the same...
>
> If you do processing, you have to know exactly what has to happen with
> the phase part of the signal to avoid that blur...
>
> There is no quality difference between realtime and non realtime, but
> you get a pretty big latency (the framesize) if you want high frequency
> resolution...
----------------------------------------------------

So a large "framesize" does some (time-)averaging that blurs any sharp attack. Overlapping lots of FFT windows won't change anything (it should result in a sum of averagings), though it has to be tested seriously. Conversely, using small windows keeps the sharp attacks in a better way but has a poor frequency resolution because there can't be any bass sounds as the window is too short.

Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap.
Please correct me if I missed some important points.

Graham Wakefield's icon

You also need to make sure your window / overlap combo doesn't
introduce distortion along the way. This is often overlooked.
Summing windows with the chosen overlap should result in a constant
magnitude; if it doesn't, you'll introduce AM distortion.

On Nov 9, 2007, at 1:58 PM, Stefan Tiedje wrote:

> julienbreval schrieb:
>> Usually, the FFT (especially in realtime) involves a time versus
>> frequency trade-off that can blur sharp attacks. Perhaps there are
>> some special techniques for avoiding it, though I have not heard yet
>> about them.
>
> This is not true as long you don't process your bins. You can do
> the test yourself, and just send audio through a pfft~ which does
> nothing. The result will be the same...
>
> If you do processing, you have to know exactly what has to happen
> with the phase part of the signal to avoid that blur...
>
> There is no quality difference between realtime and non realtime,
> but you get a pretty big latency (the framesize) if you want high
> frequency resolution...
>
> Stefan
>
> --
> Stefan Tiedje------------x-------
> --_____-----------|--------------
> --(_|_ ----|-----|-----()-------
> -- _|_)----|-----()--------------
> ----------()--------www.ccmix.com
>
>

grrr waaa
www.grahamwakefield.net

Stefan Tiedje's icon

julienbreval schrieb:
> Though I have to test it, I'm pretty sure there is a difference between an original sound "S" and FFT_inverse(FFT("S")), no matter how many frequency bands there are, how big the anaylsis window is and how many window you overlap.
> Please correct me if I missed some important points.

The mathematics of fft/ifft do reproduce the exact same values if you
don't change anything inbetween. (Assuming an overlap of 2 and a hanning
window.) There is no blur and no averaging. Listen to it...
The explanation is within the information about the phase...

A Fourier transform of infinte length, is exactly the same as the time
representation. (That's fouriers original well prooved statement.) As we
limit our timeresolution with our sample rate, we can skip some
information. You can look at a fft frame like a grain of granular
synthesis. If you don't change the pitch, the overlapping of the grains
reproduces the original signal exactly...

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

julien breval's icon
julien breval's icon

(when I say "at least in the DAW", it's because the discrete Fourier Transform encodes only N frequencies and phases, N being the FFT resolution; there are also some aliasing problems referred to as 'leakage')

julien breval's icon

Quote: julien breval wrote on Tue, 13 November 2007 13:17
----------------------------------------------------
> After this, using the FFT during a recording (like A- to B-format encoding) should be OK (as the latency is not important and because the convolution is not an approximation in digital audio if the FFT parameters are OK), especially with Farina's calibration method.
> As each room is different, and as the microphone position can vary, one should normally do the calibration before each recording, which is not possible in everyday's practice. So the current approach looks like using as many impulse responses as possible, which can help emulating the acoustics of various different spaces. The only problem I see is that when recording in *your* non-anechoic room, the acoustics of your room are combined to the ones of the room which the impulse response comes from so it must add some "error" to the encoding. Then I have no idea whether this error is significative or not (I don't have such microphone so I can't do any test. I imagine the spatial information should be more affeced, if any, than the "omnidirectionnal" sum spectra of the sounds).
>
----------------------------------------------------

Actually, after reading Farina's paper again, I was completely wrong. Really sorry as it introduces some misinformation. Here is a sum up of the correct version.

The IR measurement described by Farina should be performed in an anechoic room, though it's possible to record in a big space too as you can later remove any reflection in a DAW software.

1. Early calibration of each of the 4 capsules, by comparing each capsule to a reference microphone placed in the same direction as the capsule. This allow to get four matched capsules before matrixing.
2. Linear matrixing (see the 4 equations above in the thread)
2. Late calibration of the sound field microphone, by comparing the four matrixed signals to three different positions of the reference microphone (along the X, Y and Z cartesian directions for the X, Y and Z signals; there is no precision about W but it is perhaps done similarly, using a matched omnidirectionnal reference microphone).

The eight required filters are applied to the signal by means of a FFT convolution.

julien breval's icon

most probably the same omnidirectional reference microphone can be used for the 4 measures of part 3.

julien breval's icon

no, actually you move the position of the source (impulse) relatively to the position of the microphones ? not the relative position of the sound field microphone and the measure microphone (a measure microphone is always omnidirectional anyway :)

ps: how can one edit or delete posts in this forum ???????????

Stefan Tiedje's icon

julienbreval schrieb:
> Yes, actually there is no change (besides a huge phase offset)
> between a signal and the same signal encoded with FFT and then
> decoded back to time domain. I did a recording test and compared both
> waveforms in a DAW. As soon as the phase offset is compensated, the
> sample values of both recordings are the same (at least in the DAW).

Thanks for testing that...

> As the FFT is symmetrical at the Nyquist frequency, I chose a FFT
> size of 8192 samples, with an overlap factor of 2 and Hanning windows
> (if you use a bigger overlap factor for reducing the phase offset,
> for example in realtime use, you have to choose another Window
> shape). I would be interested in knowing which is the optimal FFT
> size for 44100 Hz though (at 8192, how many frequencies are
> repreasented inside the FFT ?)

Please go back and test it with an FFT size of 64 (Yes you read correct)
I bet the signal will still be the same, but the latency will be only
some ms...

If you take such a small FFT size, you can't do too much with it, as
each bin will cover a huge frequency range... (All frequencies are
represented, its still a continuum of frequencies...)

If you have done that, you could even try a frame size of 16 (the
minimum) it will devide your frequency into 16 equaly spaced bins. With
a sampling rate of 48 kHz the first bin is going from DC to 1.5 kHz. And
it will contain some information of the neighbour bin as well...

I never prooved it myself by ear, but you could do that easily. I am
curious about the result...

julienbreval schrieb:
> ps: how can one edit or delete posts in this forum ???????????

Not wanted, we love the whole history of failure to let newbies feel
better... (we all struggle with the same problems... ;-)

Keep on experimenting...

All the best,

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

julien breval's icon

Quote: Stefan Tiedje wrote on Wed, 14 November 2007 17:17
----------------------------------------------------
> Please go back and test it with an FFT size of 64 (Yes you read correct)
> I bet the signal will still be the same, but the latency will be only
> some ms...
----------------------------------------------------

The only problem I see is that some low frequencies may be misanalysed. The FFT size (resolution, or number of frequency bands in the analysis) and the analysis window size seem to be closely related. I am not sure whether (FFT size) = (window size) or (FFT size) = 2 x (window size), because of this symmetry at the Nyquist frequency. In any case, in an analysis window of 64 samples the lowest possible frequency is about 689 Hz (for a sr of 44100 Hz).

Then, probably the best solution for reducing latency is overlaping the analysis windows.

Stefan Tiedje's icon
julien breval's icon

Hello,

Your test patch is interesting. Actually I got similar results in the DAW. In most cases, the error can be neglected (at least if there is NO transform in the spectral domain).

Coming back to the original problem of this thread, it would now be interesting to investigate how "realistic" (I know this is not the right word) a FFT convolution is (for example, when convolving a signal with a room IR response, for implementing some kind of reverb, or for balancing two microphones using a cross-IR measure technique). I have not practised any DFT mathematics since 2002 or 2003, so I am not anymore into it. I don't have the time to test it in practice tonight, but the problem remains (at least for me): "what is the minimum FFT size for the result of a convolution between two signals is perceptually satisfactory ?" Maybe any size can fit, but I (wild-)guess the result is all the more better than the FFT size is big ... At this point I feel like needing to find a good source about DFT mathematics (perhaps I still have one).

Stefan Tiedje's icon

julienbreval schrieb:
> Coming back to the original problem of this thread, it would now be
> interesting to investigate how "realistic" (I know this is not the
> right word) a FFT convolution is (for example, when convolving a
> signal with a room IR response, for implementing some kind of reverb,
> or for balancing two microphones using a cross-IR measure technique).

Ok, thats a different story. A brute force convolution made with a
mathematical mind, would need a size of the length of the impulse
response. That's why the convolution reverb industry is protecting their
different way's of dealing with this problem with tons of patents...

A simple filter will have a short impuls response, but a real space
could have several seconds of impuls response. This would lead to very
long frame sizes... - tricky...

Stefan

--
Stefan Tiedje------------x-------
--_____-----------|--------------
--(_|_ ----|-----|-----()-------
-- _|_)----|-----()--------------
----------()--------www.ccmix.com

julien breval's icon

For calibrating microphone capsules, it's not really a problem though, even in realtime. Angelo Farina explains that you it's possible to do the measures in a sufficiently big room, so that the recorded reflections are well separated from the direct (dry) impulse. Then, in a DAW software, it's easy to remove any reflection. This finally gives a IR that is less than, say, 8192 samples, so that you can easily process the required realtime convolutions.