How recognize a sound and teach to an AI

jack white's icon

I would like crate an MaxMsp interface that using packages ml.lib and zsa.descriptors. With this interface I would like to record a sound and after I will teach it to ml.lib. After learn the sound, ml.lib when recognize the sound start a MIDI.
My problem is that I do not know what is the better object or objects to use in ml.lib package and in zsa.descriptors

This is objects in ml.lib and in zsa.descriptors

Floating Point's icon

This is exactly what I want to do. I haven't really done anything on it yet, but what I'm planning on doing is:
1. use one or more of the zsa descriptors to analyze and map a signal to meaningful parameters.
2. parse those parameters to your ml.lib package for learning and classification tasks

I think it really depends on what kind of sounds you want to analyze and how varied they are. You need to ask yourself what aspects of the sound change, and how they change, and in what way they are different from the other sounds in your collection.

Personally, I'm going to start by using the dynamic time warping module with possibly the zsa.bark object. Why? Because I want to analyze and categorize vocal gestures. The ml.dtw object can recognize similarities in time-series that are multi-dimensional, and the zsa.bark object reduces those dimensions to a manageable number. I haven't started on it yet, but that will be my starting point (for me). I'm sure I will find that I need to complement that approach with other objects (such as zsa.flux) to further refine things.

If you have a different application, the details of this approach will be different. For example if you want to distinguish a trumpet from a violin, you might want to do a spectral analysis, such as zsa.rolloff, find the average over the first 100 or so milliseconds, and then use k.nn to classify your readings. Then you might find that you need to use another zsa object to mediate the first one, so you' really need to think hard about the nature of the sounds you want to analyze and progress from there. I believe most of the work would be in getting the data from any audio analysis right first, and "massaging" it into the right form, before any ml.lib would be of any use. That form might be relatively arbitrary and abstract and would depend on the nature of your sounds and what qualities you want to extract from them.
I'm just guessing here, but that's what you would need to do too (make educated guesses).