I’m trying to figure out how to go about using Max/MSP to design a procedure for differentiating between a single-voice speech and a multiple-voice conversation in which voices overlap. In other words – the moment of overlap is what I’m interested at. I want to be able to listen to a vocal conversation between two people, and detect when voices collide / overlap.
I guess that spectral-wise there’s a major difference between a speech-based conversation with no overlaps compared with a conversation with speech overlaps, so perhaps one direction to look at is detecting sudden spectral changes?
Any thoughts / ideas / pointers for achieving this are very welcome!