Sample retrieval based on similarity

Mar 17, 2013 at 2:57pm

Sample retrieval based on similarity

Hello! I’ve got Max/MSP and Max for Live 9. I’m wanting to build a standalone or plugin that allows the user to select/import a sample and then have the software detect specific qualities (BPM, Amplitude, Noise, etc.) of that sample. The software would then select similar samples from a directory of hundreds of samples. That is, it would retrieve samples that are somewhat similar to the selected sample.

Not sure if it would be easier to have the software scan all the samples ahead of time and then just perform the weighting of the qualities and information based on the metadata extracted from the scans (that is the software would just scan and build a database or an XML file and work from that.

Has anyone seen something similar, or have any advice regarding this?

Thanks!

Tom

#67146
Mar 17, 2013 at 3:34pm

Maybe echonest could help:

http://the.echonest.com/

And this max object: https://github.com/echonest/en_analyzer

#241604
Mar 17, 2013 at 7:14pm

mpg7 encoding mxj by robin price. but i didn’t tested it for complete meta data analyzing so far, i just used it for slicing and primarily analyzing slices. O. http://crx091081gb.net/?p=246#comments

#241605
Mar 17, 2013 at 8:18pm

more precisely: i tryed to use the mpg7 enconder mxj for complete analyzing but stumbled upon some strangeness/bugs and couldn’t sort it out yet. i’m also very interessted in this so if someone knows other ways to get in-depth, non realtime analyzed, meta descriptions of audio files, please post. O.

#241606
Mar 17, 2013 at 8:42pm

making sample picking more intelligent. having thousands of samples on your disk indexed in a database. with a special browser to sort samples into predefined or custom categories, sort by ANY attribute or similarity or to compare with a given sample. such kind of sound browser would have been a nice suprise in live 9. but they still want you to type “bd” into the search field to maybe find some bassdrum samples on your harddrive;)

#241607
Mar 17, 2013 at 10:22pm

hehe, bloody social democrats, Ableton.

Anyways, I’ve been thinking about something like this, too. I just think I have too much data to go through and analyze it like this – even in a much sped up environment it’d still take ages. [ninja-edit: huh... this mpeg7 thing might be the ticket, actually yeah]

No, gimme a big folder labelled “bd” and then let chance take over. :)

#241608
Mar 19, 2013 at 11:20pm

Good information, everyone! That echonest Max Object looks like it could do the trick!

The Mpeg7 is a possibility too. However, the query by humming sounds like it would be most suited for musical samples (which could be handy). I’m trying to locate found sounds and field recordings. At the very least to analyze them. Echonest is new to me so this is certainly a good place to start.

I think Wetterberg’s onto something – using MPEG7 for searching music sample databases. That could be very useful…

Thanks again!

Tom

#241609
Mar 20, 2013 at 7:47am

I’ve just been doing a similar thing the ‘old fashioned’ way – in Max – for a current project. More specifically by modifying one of the few Max-based SQLite database examples/tutorial (MovieBase) by Andrew Benson and then expanding on that – and learning just enough SQL(ite) on the way. With this setup it is now possible to build compound searches for multiple ‘keywords’/categories etc. and quickly recall matches for playback or whatever and it works quite well. In my case, the sounds (largely ‘non-musical’) have been tagged and categorised manually using my own arbitrary set of (non-analytical) perceptual keywords, but it would of course be possible to at least partly automate the identification of sounds by analysis, so perhaps MPG7 could be useful there… hmmm…

#241610
Mar 20, 2013 at 5:16pm

so let’s take some steps together.. i did some more testing with the mpeg7 mxj, some descriptors do not affect the sql output but they are saved to xml i think.
i can get maximum of 15 values out of the sql output. i do know nothing about the quality of these values and if that is enough to create a good fingerprint of the sound for comparisation. O.

– Pasted Max Patch, click to expand. –
#241611
Mar 20, 2013 at 11:08pm

i connected with the easyDatabase.js ,i think i also have to dip into some sql

– Pasted Max Patch, click to expand. –
Attachments:
  1. easyDatabase.js
#241612
Mar 21, 2013 at 2:32am

Not much time for this right now, but here are some random queries/observations:

I sniffed about briefly but didn’t find a description of the mp7 descriptors and/or how they are derived – Any pointers/links? Maybe some descriptors will be much more useful than others?

With regard to additional fields and tags for the database that may be useful, a lot will depend on what you want to achieve and how you intend to seek them, but for text based fields there is a (very) extensive listing/breakdown in this document on the Soundminer site: http://www.soundminer.com/current/MI_Whitepaper.pdf

If you do go down the language based descriptor path: In order to avoid dealing with lots of distinct fields I used a single keyword/tag field in my .db and then made use of the SQLite LIKE operator which makes it possible to search for multiples in one field…

#241613
Mar 22, 2013 at 2:55am

I also downloaded some recommend pdf . but did not find clear definition of single descritors. i’m making good progress. i imported 20000 files (samples < 10seconds). it took about 2 hours. the database file is not even 5 mb. and just by ordering the columns (most values are floats) i now can get a picture of what the descriptor does by listening to the top entries ordered ASC/DESC.

LIKE can only be used with character strings. i have rows of floats and a name column. i’m thinking the whole time how similarity between rows can be calculated.
or how to find the closest match in a column to begin with.

aa 1. 2. 2.
bb 5. 2. 6.
cc 2. 3. 3.
aa 2. 5. 3.

i have something like this and row 3 is a closer match to row 1 then row 2, even if row 2 has 1 exact match. and row 4 is maybe a closer match to row 1 then row 3 because the character field has a higher priority in the similarity caculation.
does that require advanced sql?
Another big gap to a useful patch is a missing drag n drop ability from max4l to Live’s slots to make use of the results instantly. on windows i can call up an explorer window with the selected file in focus but that’s a poor consolation. O.
edit: seems like this pdf contains some in depth info at least for the mathematical side of things: ftp://sumin.in.ua/books/DVD-021/Kim_H.,_Moreau_N.,_Sikora_T._MPEG-7_Audio_and_Beyond%5Bc%5D_Audio_Content_Indexing_and_Retrieval_%282005%29%28en%29%28285s%29.pdf

#241614
Mar 22, 2013 at 5:05am

after more research i can now sort by similarity(distance) with an input float with this
exec “SELECT * FROM sounds ORDER BY ABS(spectralCentroid – 50.) LIMIT 100″
for example gives 100 closest matches to 50hz spectralCentroid. and now i can also expand this to sort complete rows by similarity to input values.
it’s exciting for me because this sound indexing thing was one of my first visions after discovering max 4 o. 5 years ago :) O.

#241615
Mar 25, 2013 at 2:49pm

and what i found now is even more exciting (and to make this thread more complete for people using the forum search)
there’s Alexander J. Harker’s descriptors~ objects, realtime and buffer version, for mac and win(beta). the entire external package is just great. with awesome help patches.
thank you Alexander.
http://alexanderjharker.co.uk/
edit: windows version is not public yet. but he was kind enough to give access after contacting :) O.

#241616
Mar 26, 2013 at 1:29am

Nice work 11olsen – looks like you’ve solved a lot of your challenges and thanks to your persistence with this thread there are also more links to useful resources now (ie that pdf and AJHarker externals – the latter of which I had on my system already!!) . As for the Live browser issue you mentioned earlier, perhaps there is a possibility that the Live 9 API will enable scripting of the ‘enhanced’ browser to enable listing of your search results(?). In any case, i definitely see the potential of this stuff for my purposes into the future…

#241617
Mar 26, 2013 at 11:35pm

do you know MEAP soft ?

http://www.meapsoft.org/

http://www.cycling74.com/forums/topic.php?id=6294

there’s the possibility to save the selected chunks to a new .feat or .edl file

it looks, for example, like this

#filename onset_time chunk_length 1.0*AvgPitch(1)

I’ll make an integration in max when I have time.

#241618
Mar 26, 2013 at 11:42pm

the maxmsp integration has already been made… but i don’t have the patches.

#241619
Apr 5, 2013 at 10:58pm

didn’t know that. really useful feature extraction set. the “mean MFCC” alone for example seems to output a good “fingerprint” value for the sound. the max integration you’re talking about is only a way to evaluate the output description files of MEAP. not enough to somehow integrate this in an automated process, scanning a number of files. i wish all of that would live in an mxj or C external instead of this ##### graphical interface. but it seems not to be far away, it’s open source, commented source files. O.

#241620
Apr 7, 2013 at 9:21pm

we really should translate some MEAP soft feature into an mxj… I can’t help on this side.

#241621
Apr 7, 2013 at 9:45pm
#241622
Apr 8, 2013 at 10:57am

brad garton ?
emmanuel jourdan ?
nicolas danet ?
nick rothwell / cassiel ?

#241623

You must be logged in to reply to this topic.