Sound Identification

Vara

Good Morning!

I'm new to max and can't find a proper solution for the following problem:

I want to stand under a bridge with a microphone and get a bang every time a pair of wheels from the cars above passes. I attached a short recording so you can understand the problem - some are faster or slover, heavier or lighter, somtimes the sound overlaps.

I tried various ways in max, but as I said I don't have much experience. The attached Patcher is my best try so far, but it's still very basic and not accurate enough.

Thanks in advance!

Max Patch

Copy patch and select New From Clipboard in Max.

ben sonic

It seems your recording is good enough to work only with amplitudes like in your try. But you need to treat the audio a little bit before you can do this. Have a look at MSP Dynamics Tutorial 1: Envelope Following in the documentation. You can use additionally filters to narrow the frequencies if necessary. For comparison I would rather take >~ instead of "if", to stay a bit longer the signal domain, (but maybe that's not important). I don't understand the triple comparison in your patch, i assume you wanna count cars, but a car makes two sound-events(?) then you can simply divide the result from the counter by 2.

Vara

Thank you very much for that advice! I think I understood the tutorial and i have an idea how it can help solving my problem, but I couldn't make it work yet. First, to clarify: I want to play a Sound everytime a pair of wheels passes, two for each car. Enveloping the sound makes me able to detect single cars passing, but I still have two problems:

1. The first two cars in the recording I made don't make much noise. So if I filter amplitudes just with >~ *number* I either can't detect those when using a high number or the sound of a big truck would cause many missdetections if i use a low number. I'm a bit frustrated because this seems so hard to solve while it's so easy to detect it by ear. I may have to compare every sample with one a few millisecounds before and output a bang if there's a certain procentual change, but I feel like there has to be a better, simpler solution.

2. (and more important:) I still cant detect multiple cars passing very close to another. If you hear the recording from the beginning, you will hear the two cars I mentioned and then two louder cars. Next there are three cars very close to another. How can I distinguish those? If I envelope the sound the best I could get is distinguish the first two, but not the third.

I'm from Germany and therefore it's a bit hard to explain for me, sorry.

Floating Point

try this-- you may need to adjust parmeters carefully to get results you want (ie counting cars close to eachother) but this more or less works reliably. it has two patches, a pfft and the main patch:

main:

Max Patch

Copy patch and select New From Clipboard in Max.

pfft.spectral-subtractor:

Max Patch

Copy patch and select New From Clipboard in Max.

Vara

Thank you very much for your efford! I only had time to take a short look and couldn't really understand how it works yet, but when I'm back next week I will definitely work with this, thank you!

ben sonic

here is a different approach with amplitudes (old-school), but not that tidy as floating points version, which I like very much.

btw vara, the gain is different in the first section of the audio. you can hear clearly a change in the signal as well as in the noise.

Max Patch

Copy patch and select New From Clipboard in Max.

Vara

First of all: Thanks you so much for both of your suggestions. I tested both for a while and tried to understand how they work. They obiously work more reliable then my own version, but I still can't get the rhythm of the cars out of it.

If you listen closely to the sound of the three cars close to another I described in the secound post you can hear a very distinct pattern and it's crucial for my project to recreate this as good as possible, otherwise the end result won't have an convincing effect. In both versions, even if the parameters are good enough to get the right number of bangs (6). I'm afraid I may didn't say clearly enough how important that is in the beginning.

The second Problem is that if I loop a certain part, I don't get always the same result. Is that because of the timing difference with snapshot?

I attached the part of the recording and the best pattern I could get until now so you can hear the difference. Has anyone an idea how to improve that?

P.S.: Ben Sonic, yes, you are right. I heard that, but I thought it was a good example of the sound small cars make anyway.

Floating Point

have you tweaked the parameters? there's lots in my example that are inter-related:

low pass filter frequency, the delay of the signal going into the spectral-subtractor, its fft size, frameaverage framesize (needs to be the same as fft size), frameaverage framecount, fameaverage's output multiplier, average~'s number of samples, snapshot duration, threshold level, bang reset time....

having said all that, my example may not be the best approach. there's a package called zsa descriptors which does more sophisticated real-time analysis (mostly frequency domain) that would perhaps be of use-- definitely worth checking out

ben sonic

I think, you are reaching the limits of digital analysis. Even with sound descriptors (zsa, mubu) and machine learning (ml-lib), it's still hard to get proper scientific results, but like floating point stated, worth to try, though.

Isn't your work an artistic one? Maybe I'm to lazy and easy satisfied...

Another approach is to use additional sensors/microphones. Let's say contact microphones, laser sensors and so on.

Happy patching.

ben sonic