Big coll : performance hit?

Christopher Bailey's icon

I am wondering if anyone has experience with Very Very Large colls. Like with 50000 lines, or 150000 lines. I assume there is no maximum line limit. If I ask for line# 49834, is there a performance hit?

Emmanuel Jourdan's icon

coll uses a linklist. So in order to fin the #49834 it has to iterate through everything. First apply the usual rule: don't optimize, but code! and if you have a performance issue, you can for instance split your data among multiple coll by doing something more sophisticated on the key.

Christopher Bailey's icon

Thank you for the information.

Hmmmm . . .ahh, but if I am stepping through the coll using BANG or NEXT message, then perhaps it does not have to iterate through each time you step; it "remembers" or is still "there"? In other words, I step through 1, 2, 3, . . . . 29854, and the time it takes to go from 2 to 3, is the same as the time to go from 29853 to 29854. Is this correct? If so, I am OK with what I am doing, I think . . . .

I am not sure I understand the "usual rule" that you mention. Do you mean, "program what you want, worry about performance issues when they actually happen" ? I am afraid I have done that in past and then been very sad.

seejayjames's icon

[text] doesn't use a linked list, does it? am wondering for an upcoming project where I'd need several thousand word entries, each with some category flags. would this be best to do with [text] on a line-by-line basis, or with [coll]? They wouldn't need to be changed once they're in there, just accessed with their flags.

Roman Thilenius's icon

you will often find that it is not possible to replace a [coll] with [text] so easily.

Christopher Bailey's icon

I think my original follow-up question got buried:

So if I am STEPPING through a coll sequentially, does that avoid the possible performance issue of having to iterate through the linked-list for each step?

Emmanuel Jourdan's icon

@christopher

Yes it does. next/prev starts from the last known location in the linklist.

Floating Point's icon

Emmanuel
Is the struct traversal algorithm smart enough to know that if two consecutive indexes are being addressed (ie 1, then 2) then this is the same as using ( next) ie (1, then next)?

Chris Muir's icon

Mmm... empirical data...

Alexandre's icon

We see so much topics about [coll] performance issue, sorry to repeat that again, but i find that, because of this "iterate-through-everything" neccesity, coll is the most inefficient piece of object in max. Hundreds of thousand of computer cycles to just recover one single value. ( https://cycling74.com/forums/array )

I might be wrong but my feeling is that maxers rarely use this "Linked list" feature of coll, so i think that a kind of [arraylist] object is missing in max. This ideal [arraylist] object wouldn't need to iterate through everything to retrieve the list n°49834 stored in it. It would just look directly for the list n°49834.

johnpitcairn's icon

Bear in mind that coll can use symbols as indexes.

To improve coll's performance with longer lists, a hashtab would probably be needed. Its performance would be worse than a linked list for (quite) short lists.

There's a project for somebody. The coll source is available in the SDK - try rewriting it to use a t_hashtab. Doing a "next" in a hashtab is impossible(?), so it's likely you'd need a t_linklist in parallel anyway (storing just the coll indexes), ie you're gonna wind up with an additional performance hit when storing/removing indexes etc.

Christopher Bailey's icon

Thanks for the discussion, everybody. I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events). So, except for an initial "Go To" event, I'll be stepping through, so I suppose I am OK with that.

If anyone has other suggestions on how to do sequencing, I'm all ears. . .

I know I'm not the first to use coll this way..

Floating Point's icon

Chris, you say:
I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events).

Isn't [detonate] optimized for this kind of application? Have you looked at that?
T

andrea agostini's icon

[detonate] is really about MIDI notes... I'd probably use [qlist] as a more general tool. But I also happened doing the same with [coll], and it worked quite well.

@johnpitcairn: unfortunately the source for coll is not in the sdk... it might not be too difficult replicating its basic behaviour using hashtabs, but it sounds like a very boring job! on the other hand, in the sdk there are a couple of examples dealing with databases, one of which is called "dbcuelist" - it looks quite basic, but not bad at all...

aa

andrea agostini's icon

... or you might give a look at the slots of bach.roll and bach.score - www.bachproject.net

aa

Alexandre's icon

> "I use [jit.matrix] to store data for a sequencer ; but not sure that's best ..."

From what i remember, in term of efficiency, It is the best in max to do that. Around X100 faster than coll.
A while ago, i made some efficiency test to store sequence data in an array, comparing : jit.matrix, coll, ftm, peek~ and buffer, etc.. to store and read values :

1540.Efficiencytestonarrays.zip
zip
johnpitcairn's icon

@andrea agostini: The coll source is in the Max 4 SDK. I suspect it's little different, if at all, in Max 5. I wonder what the comparative lookup times are for a database?

andrea agostini's icon

mmm... I should have a Max 4 SDK somewhere. But if it's a pre-obex version, quite a lot of things could have changed. I'll go and check it out.

I'd be curious too about the performance of a database. In principle, I'd expect it to be very efficient on huge collections of data, not as much for smaller ones.

aa

Christopher Bailey's icon

I was using a coll because I can stash text, numbers, all kinds of things etc. in there. I'm not sure that jit.matrix can do this? (Although of course there are ways around that).