Detect who is speaking/playing - multiple microphones - live setup

musinou's icon

Hi!

So, I have 4 speakers/players on 4 microphones.

I want to detect who is speaking/playing. They may speak/play at the same time. It is a live setup.

Because sometimes people are speaking/playing pp or sometimes ff using a simple threshold does not do it, because when A is pp, it can be less signal than the leak in A's microphone when B is ff.

I tried inverting the sound of the 3 other microphones, multiplying them by a factor (0.3) and adding it to the current microphone. Not better.

I do filter everything that is out of range. That helps a lot, but, it is far from reliable.

Any idea, starting point or inspiration would be greatly appreciated!

Thank you!

bkshepard's icon

Just thinking out loud here... What if you put a lowpass filter with a pretty low cutoff frequency in front of your threshold detector? Perhaps the low frequency content due to proximity effect could help increase the accuracy of your threshold.

musinou's icon

@BKSHEPARD Thank you, that was a good bet. I tried it, but sadly, stage movements are a lot louder than the proximity effect.

Now, my next bet is to filter out a sound with another. In theory, sound that is louder in a microphone than another should stick out. Really similar to the inverting idea, but somewhat slower.

florian1947's icon

Why not, after normalizing, try delays? The closest Mic will come in first.

Rodrigo's icon

edit: pretty much what Florian said.

I had a similar issue with DIY drum triggers where hits on one drum could come up as quiet hits on the other drum.

My solution was to have a lockout based on whenever an onset is detected from a single source.

Meaning if player1 claps their hand, their microphone will pick it up before the other microphones (sound travels quite slow, relatively speaking), so based on an attack being detected on mic1, mics2/3/4 can't get triggered for 50ms or something like that. This worked on contact mics that were probably around 1 foot apart, so it should work with performers a few feet apart.

Something worth trying either way.

(I also filtered the audio so it focused primarily on the frequencies that were important, so filtering out lows pre-onset detection would be wise as well).

musinou's icon

@FLORIAN1947 Thank you for your reply. Where would you put delays?

musinou's icon

@RODRIGO Hi! Funny to meet here! I'll try your attack approach. It is not for percussion, so, I wonder if it would work as well as for a drum. mmmm
Thanks!

bkshepard's icon

One of the things I have encountered when trying to use something like a delay in one microphone to control the input of another microphone is that you tend to get all sorts of unwanted results when two or more people make a sound at the same time. I'm assuming you are using directional microphones. I don't know what types, but if you're not, you might consider using a dynamic mic like a Shure SM57. Yeah, I know the audio purists don't like 'em, but they work really well as a highly-directional stage mic for live performance as long as the performer is close to the mic and their sound is on-axis with it.

My suggestion for low-frequency detection was as a side-chain input for an expander, not for filtering the low frequencies from the sound itself. You run a copy of the mic's input through a lowpass filter and use that signal to drive an expander or soft gate. When the performer is close to the mic, their sound will have both a stronger signal AND more low-frequency content. You can use that to open the channel for the mic to let the sound through. It's not always ideal, and some sounds are still likely to pass through, but with some judicious tweaking, you can get pretty close. You might also use a high-pass to prevent the rumbling of the stage movements from creating false-triggers as well.

musinou's icon

@BKSHEPARD At the moment, most of the rumbling stuff seams to be in the low range, and the proximity effect is too subtile comparing. boooh

Rodrigo's icon

Hi! I didn't immediately recognize your screen name, but small world indeed!

I mentioned onset detection as your original post mentioned that you wanted to "detect" who is speaking/playing, something like this (not actually delaying anything in Max, but using sounds rate of speed through the air as the delay) could let you know who did something first, and from that you can do whatever you want with the audio. Varying the lockout time will extend the exclusivity of the "person1 is speaking".

If you want to actually extract audio from one mic with multiple simultaneous voices happening at once, that gets a lot more complicated....

musinou's icon

@RODRIGO I think you read my reply to @FLORIAN1947 thinking it was your.
As you say, small world!

What I tried is the filtering of all the other microphones inputs. Well, it is actually inverse FFT convolution.

So the main input gets convoluted with the [!-~ 1] amplitude of the sound I want to filter out. Well, that said, it sounds kind of ok. But to make sure it works, I pushed the gain of the inputs I needed to filter out. That way, it sounds horrible, but I have only sounds that are pickup by that microphone and not the others. So, I get false positives on breathing. To avoid as much as possible, I made a rough filter that let just voice range to pass. I did put too a compressor after the convolution, but, I am not sure it helps, yet.

I would even think I could use the same idea to use that sound to put an effect on an instrument without having the sound of the instruments that is around it.

It is not perfect, but, it is not bad. I often have noisy things that triggers false positives. Maybe I could detect if what I get is just noise or pitched?

musinou_inverseConvolution.maxpat
Max Patch
Rodrigo's icon

I was trying to do a two-in-one answer, but it is late, and I am tired, so things get funny either way, hehe.

Ok, if you're trying to subtract the sound of one mic from the others, then that is definitely far more complicated, and one of the banes of "mixed music".

What kind of mics are you using? You can get a lot of mileage out of cleverly using polar patterns (if the room/setup allows). Like having mics with figure8 patterns setup at 90degrees from the other sources, for maximum cancellation. Same goes for speaker dispersion and placement. The more of the problem you can solve "acoustically" the better.

Depending on the sources as well, you can use less-than-optimal mic placement to focus on cancellation/isolation and use HISS Tools mic correction techniques to produce impulse responses to correct for the shittier placement.
http://eprints.hud.ac.uk/14897/

You can also do descriptor analysis to figure out if something is noise/pitch, or if you want to get really fancy a median filter to separate the noise and the pitch. Or something like what @DIGIOLOGY posted here:
https://cycling74.com/forums/monophonic-pitch-shift-real-time-timbre-neutral/

florian1947's icon

Where would you put delays?

I wouldn't put delays, but try to use the natural delays. The idea was 'first come, first serve' to distinguish the input closest to the source, and then to dim down the other inputs. This can be made to work in conference type of situations, where ideally only one source at a time ought to be active.
I cannot imagine it to work in music. Even if you succeed in 'training' each input, to know which voice/instrument it serves, you will still be left with awkward ways to gate out the spill. This is bound to sound awful, after all the work done.

Frankly, crosstalk cancellation, - which is what you're after - , has to be done by selection and placement of microphones, choice of acoustic environment plus maybe in-ear-monitoring for the players if needed.

Emiliano Brescacin's icon

Hi Rodrigo, i have a similar problem and i think your solution is exactly what i’m looking for.

I have a cymbal with 2 contact mics, one at each side and i want detect the position in which i’m hitting the cymbal, if i’m close to the left's mic or to the right's mic.

Since the cymbal is designed to spread the sound around the entire surface, the loudness detection doesn’t work, that’s because the two mics reports almost the same value. But we can clearly ear the position in the stereo field due to the Hass effect.
The mics are 35cm away from each other.
Is your solution able to detect this difference in time?

Can you share the patch?

Thank you.