Real-time subtraction in the frequency domain

Markus Baumknecht's icon

hi guys!

I'm building a room installation in which a constant background sound is played (noise, music, anything) and when when someone in the room starts talking, the sound subsides and the speaker's voice is recorded.

the way I'm thinking about it, I'd need some kind of 'threshold' mechanism that is able to recognise when someone is speaking even against the backdrop of the ambiance noise/music I'm playing. I'm guessing that the volume in the room will be sufficiently high to essentially drown out more quiet speakers, which renders a simple 'volume threshold' insufficient for my idea.

I feel that this could be quite challenging, technically, if it's at all possible. my technical vocabulary might not be big enough, but I thought maybe a combination of bandpass filtering, microphone type and placement, as well as filtering out the noise/music ambiance playback from the sound recorded from the room by subtracting it might do the trick.

1. bandpass filter
2. microphone type
3. microphone placement
4. subtraction of the noise/music playback signal from the recorded room signal

my question is if anyone can give me an estimate of the feasibility of this approach, as well as suggestions for realisation for any of the above four components. especially the microphone type and the subtraction part I feel the most clueless about. if anyone has any pointers, a good technical solution or a completely different approach, I'd be grateful. thanks!