Confessions of a Coll convert (some Dict questions)

Rodrigo's icon

Breaking this out from this topic as it was going a little off-topic.

After coming across this great resource that @broc posted, and knowing that dict is more useful/powerful anyways, I've decided to migrate a bunch of my patching (and thinking) to dicts instead of colls.

Towards that, I'm currently stuck on two (seemingly simple things).

The first is the equivalent of a [dump] message to coll. More specifically, knowing that all the data has finished dumping. One of the objects/entries in my dict contains a huge amount of key/value pairs (over 150k in some instances). Now, with coll I could [dump], and it would bang when it was done, so I can carry on doing whatever else I needed to do with the data. With dict, I know how to querry for that one object's data, but not how to know that it is done.

I can easily do this by adding a footer to the end of that string of entries, and take that as the "done" bang, but I want to keep the data structure in tact, without extra/useless data (otherwise, I'd just stay with coll).

The second issue is how to send all of that data down into the hierarchy without having to use regexp?

The data structure (a section of it posted below) is such that there is a top level object "analysis" which contains key/value pairs, where the key is a time in ms ("10", "20", etc...), and the value is an array of values. So to set each of those I am using [append], specifically like this:

append::10 -30.743082 0. 110.99221 -14.119069

Which works great, BUT since the first number is the key, and the append command needs "append::{key}", I don't know how to format the message.

Now, I can think of a way around this using regexp, but I want to avoid that since it's a bit hacky, and I don't trust regexp for high cpu/throughput stuff (have had crash issues in the past).

Is there a way to do that easily? Am I overlooking a message type? Or perhaps there's a better way to structure the data?

What I really need is to be able to send a single message (in this case "get analysis") and get the whole wad of data at once.

{
    "file" :     {
        "name" : "accordion_noise.wav",
        "duration" : 326314.0
    }
,
    "settings" :     {
        "fftparams" : [ 2048, 128 ],
        "descriptors" : "pitch 0.66 median log_centroid 10 20000 mean",
        "windowsize" : 40,
        "overlap" : 10,
        "units" : 32628
    }
,
    "minmeanmax" :     {
        "loudness" : [ -106.759056, -30.991779, -3.235133 ],
        "pitch" : [ 0.0, 67.068283, 140.0 ],
        "centroid" : [ 46.399357, 91.766556, 125.449028 ],
        "sfm" : [ 118.388718, -27.2827, -4.648997 ]
    }
,
    "histogram" :     {
        "loudness" : [ 0.00175, 0.0, 0.000875, 0.005249, 0.005249, 0.006124, 0.010499, 0.011374, 0.019248 ],
        "pitch" : [ 0.001116, 0.0, 0.001116, 0.001116, 0.00558, 0.001116, 0.001116, 0.002232, 0.001116, 0.001116 ],
        "centroid" : [ 0.002786, 0.002089, 0.018106, 0.027855, 0.034123, 0.052925, 0.086351, 0.117688, 0.154596, 0.193593, 0.283426, 0.332869 ],
        "sfm" : [ 0.000808, 0.001617, 0.006467, 0.007276, 0.008892, 0.012935, 0.012126, 0.017785, 0.017785 ]
    }
,
    "analysis" :     {
        "0" : [ -30.743082, 0.0, 110.99221, -14.119069 ],
        "10" : [ -29.847477, 0.0, 102.78936, -18.857525 ],
        "20" : [ -30.850266, 0.0, 106.913391, -14.339486 ],
        "30" : [ -34.505981, 0.0, 100.441933, -19.840559 ],
        "40" : [ -37.735123, 0.0, 96.140182, -22.903746 ],
        "50" : [ -45.966511, 0.0, 90.787933, -27.842623 ],
        "60" : [ -38.871426, 124.842621, 103.400925, -15.41413 ],
        "70" : [ -35.512058, 0.0, 105.463936, -14.856118 ],
        "80" : [ 31.557777, 0.0, 114.881561, -7.335074 ],
        "90" : [ 24.237932, 0.0, 106.483566, -12.101012 ],
        "100" : [ 24.299706, 0.0, 101.199158, -17.09067 ],
        "110" : [ 29.310883, 0.0, 95.058838, -24.040127 ]
    }
}
andrea agostini's icon

Hi Rodrigo.

I'm probably saying the obvious, plus I'm not a dict wizard, but

- I'd expect dict's behavior with respect to queries to be consistently synchronous. This means that you can always know when a query has been performed by just waiting for a bang to come on the left:

Max Patch
Copy patch and select New From Clipboard in Max.

- As for building the keys for appending, I'm probably missing something but can't you just use sprintf or combine?

On the other hand, I'd be a bit worried about 150k key/value pairs: since unfortunately all keys are symbols, I think you're stressing the Max symbol table quite a bit...

Hope this helps,

aa

Rodrigo's icon

Hmm, I didn't think that it would be synchronous, but yeah that makes sense. The dict gets .iter'd afterwards.

In that case, would the dict iterate completely before sending the bang?

I did play with combine/sprintf some, but just assumed that I was missing something about how to structure this. Given your final comment that may still be the case. (more on this below)

The solution is quite simple with [zl slice] + [combine]:

Max Patch
Copy patch and select New From Clipboard in Max.

So with regards to the symbol table, would you suggest storing each of the entries as an array of objects? Or would that not make any difference? It should be noted that I may be loading multiple of these dicts as once... (though the dict is really just a storage/recalling mechanism for other parts of the patch)

stkr's icon

guarantee the dict dump complete with deferlow:

Max Patch
Copy patch and select New From Clipboard in Max.

or, if unpacking keys further down the line just get a [t b] from the final (left most) one.

stkr's icon

maybe i am totally missunderstanding, but getting keys and parsing the data all over your max planet without touching much of the max symbol table is very simple: always use the dict objects!

testForRod.maxpat
Max Patch

Rodrigo's icon

Hah, my dict hero(s) to the rescue!

So for the first bit, that makes sense. Wish there was a dedicated 'done' bang, but this will work.

I didn't realize that you could just break apart the dictionary that way (total dict noob) but that seems handy.

However, I'm not sure how one would extract all of the analysis data, short of requesting each individual key, particularly when I need the key as well. In your example I would need an impossibly long [dict.unpack 10: 20: <-----> 1475550:], or would need to individual request them out of the sub dictionary using [uzi 15000] * 10(?).

Basically I need to output this:

0 -30.743082 0. 110.99221 -14.119069
10 -29.847477 0. 102.78936 -18.857525
20 -30.850266 0. 106.913391 -14.339486
30 -34.505981 0. 100.441933 -19.840559
40 -37.735123 0. 96.140182 -22.903746
50 -45.966511 0. 90.787933 -27.842623
60 -38.871426 124.842621 103.400925 -15.41413
70 -35.512058 0. 105.463936 -14.856118
80 -31.557777 0. 114.881561 -7.335074
90 -24.237932 0. 106.483566 -12.101012
100 -24.299706 0. 101.199158 -17.09067
110 -29.310883 0. 95.058838 -24.040127

Perhaps this should then be a single object, where all of these entries are objects in an array, so I only querry for the 'analysis' key, and get all of this data as the 'value'?

Lastly, the [combine analysis::] method didn't work at all. It hung my computer up for 30min+ (before I finally force quit it), so that's not a good way of getting the data into a dict at all.

So the deferlowbang=done bit I get. It's now just a matter of getting this huge amount of data into, and out of, a dict.

LSka's icon

[dict.iter] ?

or simply collect the results in another dict and dynamically query the keys you need (have a look at the patch below for a couple examples)

Max Patch
Copy patch and select New From Clipboard in Max.

Rodrigo's icon

@LSKA

Yeah that does it, and was I was doing with this data structure before, the dict.unpack threw me. Just dumping (iter-ing) through is ideal.

So the remaining problem is packing all that information in, specifically if I shouldn't use [combine analysis::] to get it in there.

The data would be being generated from an offline (uzi-driven) analysis of a buffer~ (using Alex Harker's [descriptors~].

Like this, but with a much much longer audio file.

Max Patch
Copy patch and select New From Clipboard in Max.

(which does work, by the way, I'm just concerned about using up the symbol table as per @Andrea's comments)

LSka's icon

you can use pull_from_coll or dict.group:

Max Patch
Copy patch and select New From Clipboard in Max.

stkr's icon

yes i ignored certain conditions, as well as filling, when i posted yesterday :-)

i would go with lska's method, but just completely ignore coll / symbol table. also generalise it. so, the attached patch will take any arbitrary audio file and analyse it at the 'window' size and store the results at 'overlap' index increments. you could easily impose a filesize limit on size of analysis, too.

note also that you could simply accumulate a master dict with new analysis entries stored by the file name rather than a single 'analysis' dict every time, with just a few tweaks. if that is something which could be helpful / required.

Max Patch
Copy patch and select New From Clipboard in Max.

Rodrigo's icon

Yup, that dict.group method works well!

I'm still struggling with the order of events when creating the dict, so the "header" (name/info/etc...) part appears before the "analysis" etc... I know it doesn't matter when querying the dict (one of the perks of dicts), but this is just to aid the human-readability of the json file.

For example, [dict.pack analysis:] wipes out anything I've put in the dict before that point, so I need to [dict.join] together the file I want. Nothing too complicated, but a different way of thinking, especially since things unpack "backwards" (right to left) in the Max world, but things get put in "forwards" (top to bottom) to dict.

The bigger patch that this analysis bit comes from takes arbitrary window/overlap, and does all the maths for you (including when to stop analyzing so that you don't analyze past the end of the buffer with the overlap/windowing). I was thinking about putting in a check for duration, but I don't know what the maximum duration would be, so if people want to try passing a 30min audio file into it, maybe it will work. A check for a file shorter than the window size would be smart though.

Max Patch
Copy patch and select New From Clipboard in Max.

re: accum

That's a good point, and something I may look into for the purposes of using this in M4L. At the moment, each analysis file is its own dict file, since the dict file is only used for storing/loading. Once it ends up in the actual patch, the contents of the dict don't matter, and get erased each time. That being said, I do exactly that accum thing in the patch to create a "mega dict" (in the form of Alex Harker's [entrymatcher]), which is where all the real-time querying happens. But it might be useful to keep a dict of dicts for storing/recalling the state of a M4L device.

Aaaaand, as an aside, I worked out a slick way of using a deferlow'd counter (instead of uzi), so that analyzing large files happens without pinwheeling the computer, and showing a nifty progress bar. Doesn't look like much in this example, but when you have a 20min audio file, it can take a really long time, depending on your CPU speed.

Max Patch
Copy patch and select New From Clipboard in Max.

Rodrigo's icon

Ok, so this is all working great. Thanks to all the help on here I've gotten a nice and neat dict data structure working, but there is a pretty big downside.

Performance wise, coll->dump happens about 10 times faster than get->dict->dict.iter. So something that takes only 2sec to dump out of coll is taking over 20sec to dict.iter out.

This is a bit of a problem in a patch that needs to load huge amounts of data from multiple databases. With coll you would get a bit of pinwheeling, but it wasn't intrusive. With dict it's now a whole waiting game to get the patch up and running.

edit: And trying to move the analysis data into coll first with push_to_coll takes as long as dict.iter, so using coll as a proxy this way doesn't work either.

The actual data set is too long to post here but it's basically 33k lines of this kind of thing:

0 -30.743082 0. 110.99221 -14.119069
10 -29.847477 0. 102.78936 -18.857525
20 -30.850266 0. 106.913391 -14.339486
30 -34.505981 0. 100.441933 -19.840559
40 -37.735123 0. 96.140182 -22.903746
50 -45.966511 0. 90.787933 -27.842623

Max Patch
Copy patch and select New From Clipboard in Max.