Apr 28 2010 | 9:28 am

These are all great ideas. I implemented @swieser1 suggestion involving these constrains:

Signal has to be…
1) Centroid has to be within a determined window (ex 200 – 2000 hz)
2) Stable (measured by the standard deviation of a list of centroid values)
3) Above a threshold
4) all in few hundred ms

However, i noticed, like many have mentioned, that the human voice makes some beautiful periodic spikes in the spectrum… Next step is to use that. Maybe if spikes are spread out regular intervals… zsa.freqpeak~ does this…

It’s totally fine if this system mistakes a trumpet, obe, sax, with a human voice. Though, NOT with a car horn, hammer, fridge hum, etc…

I wish I could send my whole patch to u guys buy I have so many externals and I don’t know how to compile the whole thing to send :S

any way a must-have is "zsa.descriptors": http://www.e–

  1. Lave.mxe

