Big coll : performance hit?
I am wondering if anyone has experience with Very Very Large colls. Like with 50000 lines, or 150000 lines. I assume there is no maximum line limit. If I ask for line# 49834, is there a performance hit?
coll uses a linklist. So in order to fin the #49834 it has to iterate through everything. First apply the usual rule: don’t optimize, but code! and if you have a performance issue, you can for instance split your data among multiple coll by doing something more sophisticated on the key.
Thank you for the information.
Hmmmm . . .ahh, but if I am stepping through the coll using BANG or NEXT message, then perhaps it does not have to iterate through each time you step; it "remembers" or is still "there"? In other words, I step through 1, 2, 3, . . . . 29854, and the time it takes to go from 2 to 3, is the same as the time to go from 29853 to 29854. Is this correct? If so, I am OK with what I am doing, I think . . . .
I am not sure I understand the "usual rule" that you mention. Do you mean, "program what you want, worry about performance issues when they actually happen" ? I am afraid I have done that in past and then been very sad.
[text] doesn’t use a linked list, does it? am wondering for an upcoming project where I’d need several thousand word entries, each with some category flags. would this be best to do with [text] on a line-by-line basis, or with [coll]? They wouldn’t need to be changed once they’re in there, just accessed with their flags.
you will often find that it is not possible to replace a [coll] with [text] so easily.
I think my original follow-up question got buried:
So if I am STEPPING through a coll sequentially, does that avoid the possible performance issue of having to iterate through the linked-list for each step?
Is the struct traversal algorithm smart enough to know that if two consecutive indexes are being addressed (ie 1, then 2) then this is the same as using ( next) ie (1, then next)?
Mmm… empirical data…
We see so much topics about [coll] performance issue, sorry to repeat that again, but i find that, because of this "iterate-through-everything" neccesity, coll is the most inefficient piece of object in max. Hundreds of thousand of computer cycles to just recover one single value. ( http://cycling74.com/forums/topic.php?id=28901 )
I might be wrong but my feeling is that maxers rarely use this "Linked list" feature of coll, so i think that a kind of [arraylist] object is missing in max. This ideal [arraylist] object wouldn’t need to iterate through everything to retrieve the list n°49834 stored in it. It would just look directly for the list n°49834.
Bear in mind that coll can use symbols as indexes.
To improve coll’s performance with longer lists, a hashtab would probably be needed. Its performance would be worse than a linked list for (quite) short lists.
There’s a project for somebody. The coll source is available in the SDK – try rewriting it to use a t_hashtab. Doing a "next" in a hashtab is impossible(?), so it’s likely you’d need a t_linklist in parallel anyway (storing just the coll indexes), ie you’re gonna wind up with an additional performance hit when storing/removing indexes etc.
Thanks for the discussion, everybody. I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events). So, except for an initial "Go To" event, I’ll be stepping through, so I suppose I am OK with that.
If anyone has other suggestions on how to do sequencing, I’m all ears. . .
I know I’m not the first to use coll this way..
Chris, you say:
I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events).
Isn’t [detonate] optimized for this kind of application? Have you looked at that?
[detonate] is really about MIDI notes… I’d probably use [qlist] as a more general tool. But I also happened doing the same with [coll], and it worked quite well.
@johnpitcairn: unfortunately the source for coll is not in the sdk… it might not be too difficult replicating its basic behaviour using hashtabs, but it sounds like a very boring job! on the other hand, in the sdk there are a couple of examples dealing with databases, one of which is called "dbcuelist" – it looks quite basic, but not bad at all…
> "I use [jit.matrix] to store data for a sequencer ; but not sure that’s best …"
From what i remember, in term of efficiency, It is the best in max to do that. Around X100 faster than coll.
A while ago, i made some efficiency test to store sequence data in an array, comparing : jit.matrix, coll, ftm, peek~ and buffer, etc.. to store and read values :
mmm… I should have a Max 4 SDK somewhere. But if it’s a pre-obex version, quite a lot of things could have changed. I’ll go and check it out.
I’d be curious too about the performance of a database. In principle, I’d expect it to be very efficient on huge collections of data, not as much for smaller ones.
I was using a coll because I can stash text, numbers, all kinds of things etc. in there. I’m not sure that jit.matrix can do this? (Although of course there are ways around that).