Detecting spatialization of binaurally recorded stereo file?
I’m working on a project in which I’d like to map the dynamics/intensity of various points of the stereo field to LED-driven lanterns. A nighttime symphony of frogs, specifically. Kind of the complement to something I’ve done a lot of before, which is chop up a jitter image into several vertical slabs, and then use the brightness/etc to control sounds.
How I thought I’d do it is something like this:
(a) play the stereo recording in max (sfplay~ or similar) – it’s quite spatial b/c it was recorded with Sonic Studios DSM mic
(b) somehow, on the fly, divide it into 14 channels, left to right (these will not be going out to actual channels, though, just used for #s.
(c) measure the intensity of each of these streams and spit out into a number, say 0.-1.
(d) translate each of these numbers into 0-255, and send each one out via Arduino Mega to 14 LEDs to control dimming.
I think I have most of it figured out, but I’m banging my head against how to analyze the stereo file and divert it into 14 areas.
Essentially, what I want is if there are discrete sounds/attacks at the same time say far to the left, 3/4 the way to the right, and all the way to the right, but nowhere else, that the LEDs in those locations would light up. Preferably through dimming, so they flicker.
Any and all ideas welcome. I could be missing something easy – I hope I am.
Here is a beginning, but it’s not working as expected, likely due to the fact that I don’t know anything about ambisonics!
It uses Graham Wakefield’s ambisonic externals.
----------begin_max5_patcher---------- 892.3oc0XEzjZBCE9r9qfg1iVm7RBBzdpm5sdoG6zoSTXcSGL3HY6ta2Y82d gDbcsqLDBXjdPThwjuuu2WduG9zzI9KyeHsv26ide2axjmlNYhZnpAlTe+D+ MrGVkwJTSyeSpLc2d+Y5uJ+NYVpT931T8Z3eSVNS568i5ueKSt5VtX8O2ktR pmR.MdNZlGNjpdCUcMlLG8xuQb2FtnbUU6GTOHOQs64K+0GBn9GmoF.5oVM3 ySmVcYlyXCrnmrgLhXCMpuwF7XhMA8M1.iI1PP8jMnwDaP8L1PiGQrgTSCqY CYLkEfPI8jMC34FQ58kK4g0Sl9fBj9rMK4ySRWkmjt2Cm3AERunF4bAesfk4 OaP9zKxvM4BYA+OpMApzplUzf.kTRBqdChoyCptqIIk7VIE+hAoZWErMpc0+ K7rLuuwDEmWviFXAOUTI3hRE2Fo1dgCVnbj.JRKiMIb32JbfM5FwBcSludcV Zi5BWzViDZFFe7LW0UiOyAQCYFjhB15z23Axxy258dnQNZSDNTGfAEiCzYbf tDeCsI.OjpRgLe6fJI.VKCwpRjjftqIzqulv1IumKRxu+hHMgJ6RDt6RCwUR SC4QSXq16AGSf1AtuPm8KVIAKPcOUH1Ftirf6k8XTtHszBRYggkLwZarAXZc g.U1xfVjhyjsL1Fk.eMZFqlp.I1xlwfArs+0LtXuAk9aqXGFqRsATjteHEov cwJCCWH5RaVgEpt9vTvRyZz+Ml0ZpBZpZgYMb74UAptCEJ1RuJc3BP8rGy5y aQHK6wjb4abHeapnqcL.yChCfnvVKcForkDctmntDBO+SQ74c7JOzEtigha1 lwdrqOyk0Yqps6nPK6xvplLN7DuEreefTu6qJBoIe0GKegKAFFEunDSH8H9e pjjuRqUvzOiK92+UaEUqF+z.PQ9c6VcXKO7bTdGYaRZgjKXRdt30SJ7jIcKO o7ohUMKUOxFdx17xii0f3DHNq6XJz.LgcKjnlHSNFSjQHl.SwD3LLMBCczSD fFfDz.jfKBjv8ARWSyDxsXBY.lhcJjLIaYjys2sgH2VRAa.hnNEQjQWMNJL5 fjIgMv4HBL.Qtq7F1DmDwwpDw.YhPbOlvskCH18XhzBlBPtGSz1vzUvOEzFl vtGSKZCSWAOdXaXxsE5LppBw4PBLARfSgD1DHgsDRk277z+BOze8S. -----------end_max5_patcher-----------
detecting the azimuth and/or elevation will only work by comparison; you need a reference signal of the same sound or the source you are goijng to analyze must already contain a movement (of static/repeating sound material).
Hi Roman, thanks. The sound file I have has sections which are pretty static/repetitive, ie lots of clicking frogs. What do I do with that?
Everyone: what I’m trying to achieve is: imagine a sine wave panned hard left to hard right. Imagine a series of 14 lights placed left to right on the stage. Imagine the lights flickering from left to right. Now imagine that instead of a sine wave you have a stereo file creating a much more complex pattern of lights.
Is this possible?
the mic you are using is a stereo mic, so you can’t decode it into multiple channels– any sense of spatialization is due to the induced perception due to interaural time differences and level differences, plus possibly some spectral cues due to the design of the mic (pseudo-binaural);
in order to do what you want with the current recording you have, you would have to get the computer to mimic the way the brain processes these cues before the computer could ‘tell’ where the sound is coming from. This is not possible unless you have a few decades of research time spare.
My advice is to go back to the pond with an ambisonic mic, or perhaps even better, place a number of omni mics around the area, and do a multitrack recording, one track per led light.
Thanks. I was afraid of this. Unfortunately, the recording is from rural Thailand, so I won’t be getting back there anytime soon.
I’d hoped that there might be an algorithm that *could* decode those subtle time differences as well as differences in level; I guess I’m happy that brains are still more powerful by far than computers, but it would be cool to translate stereo spatialization visually.
Oh well. Thank you for saving more time trying to track down the impossible!
the impossible ideas are always the best…