Big coll : performance hit?

Dec 10, 2010 at 4:33pm

Big coll : performance hit?

I am wondering if anyone has experience with Very Very Large colls. Like with 50000 lines, or 150000 lines. I assume there is no maximum line limit. If I ask for line# 49834, is there a performance hit?

#53854
Dec 11, 2010 at 1:23am

coll uses a linklist. So in order to fin the #49834 it has to iterate through everything. First apply the usual rule: don’t optimize, but code! and if you have a performance issue, you can for instance split your data among multiple coll by doing something more sophisticated on the key.

#193841
Dec 11, 2010 at 5:06am

Thank you for the information.

Hmmmm . . .ahh, but if I am stepping through the coll using BANG or NEXT message, then perhaps it does not have to iterate through each time you step; it “remembers” or is still “there”? In other words, I step through 1, 2, 3, . . . . 29854, and the time it takes to go from 2 to 3, is the same as the time to go from 29853 to 29854. Is this correct? If so, I am OK with what I am doing, I think . . . .

I am not sure I understand the “usual rule” that you mention. Do you mean, “program what you want, worry about performance issues when they actually happen” ? I am afraid I have done that in past and then been very sad.

#193842
Dec 11, 2010 at 5:14am

[text] doesn’t use a linked list, does it? am wondering for an upcoming project where I’d need several thousand word entries, each with some category flags. would this be best to do with [text] on a line-by-line basis, or with [coll]? They wouldn’t need to be changed once they’re in there, just accessed with their flags.

#193843
Dec 11, 2010 at 8:15am

you will often find that it is not possible to replace a [coll] with [text] so easily.

#193844
Dec 11, 2010 at 1:35pm

I think my original follow-up question got buried:

So if I am STEPPING through a coll sequentially, does that avoid the possible performance issue of having to iterate through the linked-list for each step?

#193845
Dec 11, 2010 at 5:25pm

@christopher

Yes it does. next/prev starts from the last known location in the linklist.

#193846
Dec 11, 2010 at 5:27pm

Hello Christopher Bailey,

Test ?

– Pasted Max Patch, click to expand. –
#193847
Dec 12, 2010 at 12:23am

Emmanuel
Is the struct traversal algorithm smart enough to know that if two consecutive indexes are being addressed (ie 1, then 2) then this is the same as using ( next) ie (1, then next)?

#193848
Dec 12, 2010 at 4:47pm

Hello Terry McDermott,

Is the struct traversal algorithm smart enough to know that if two consecutive indexes are being addressed (ie 1, then 2) then this is the same as using ( next) ie (1, then next)?

according this test, i should say NO ;-)

– Pasted Max Patch, click to expand. –
#193849
Dec 12, 2010 at 6:32pm

Mmm… empirical data…

#193850
Dec 12, 2010 at 7:52pm

We see so much topics about [coll] performance issue, sorry to repeat that again, but i find that, because of this “iterate-through-everything” neccesity, coll is the most inefficient piece of object in max. Hundreds of thousand of computer cycles to just recover one single value. ( http://cycling74.com/forums/topic.php?id=28901 )

I might be wrong but my feeling is that maxers rarely use this “Linked list” feature of coll, so i think that a kind of [arraylist] object is missing in max. This ideal [arraylist] object wouldn’t need to iterate through everything to retrieve the list n°49834 stored in it. It would just look directly for the list n°49834.

#193851
Dec 12, 2010 at 8:41pm

Bear in mind that coll can use symbols as indexes.

To improve coll’s performance with longer lists, a hashtab would probably be needed. Its performance would be worse than a linked list for (quite) short lists.

There’s a project for somebody. The coll source is available in the SDK – try rewriting it to use a t_hashtab. Doing a “next” in a hashtab is impossible(?), so it’s likely you’d need a t_linklist in parallel anyway (storing just the coll indexes), ie you’re gonna wind up with an additional performance hit when storing/removing indexes etc.

#193852
Dec 13, 2010 at 5:23am

Thanks for the discussion, everybody. I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events). So, except for an initial “Go To” event, I’ll be stepping through, so I suppose I am OK with that.

If anyone has other suggestions on how to do sequencing, I’m all ears. . .

I know I’m not the first to use coll this way..

#193853
Dec 13, 2010 at 7:06am

Hello Christopher Bailey,

I use [jit.matrix] to store data for a sequencer ; but not sure that’s best …

#193854
Dec 13, 2010 at 10:00am

Chris, you say:
I am using coll as a kind of storage for sequencer data (where each line contains a variable number of parameters for different kinds of note/events).

Isn’t [detonate] optimized for this kind of application? Have you looked at that?
T

#193855
Dec 13, 2010 at 11:32am

[detonate] is really about MIDI notes… I’d probably use [qlist] as a more general tool. But I also happened doing the same with [coll], and it worked quite well.

@johnpitcairn: unfortunately the source for coll is not in the sdk… it might not be too difficult replicating its basic behaviour using hashtabs, but it sounds like a very boring job! on the other hand, in the sdk there are a couple of examples dealing with databases, one of which is called “dbcuelist” – it looks quite basic, but not bad at all…

aa

#193856
Dec 13, 2010 at 11:36am

… or you might give a look at the slots of bach.roll and bach.score – http://www.bachproject.net

aa

#193857
Dec 13, 2010 at 8:37pm

> “I use [jit.matrix] to store data for a sequencer ; but not sure that’s best …”

From what i remember, in term of efficiency, It is the best in max to do that. Around X100 faster than coll.
A while ago, i made some efficiency test to store sequence data in an array, comparing : jit.matrix, coll, ftm, peek~ and buffer, etc.. to store and read values :

#193858
Dec 14, 2010 at 7:20am

Hello Alexandre,

[jit.matrix] have few problems for data collection use :

1. cannot be zero sized ; at least one.
2. always use [jit.qfaker] for high prioritize it.
3. cannot easily be resized without loosing values.

but it seems the best native object ;-)

PS : i didn’t test it with 150 000 values …

#193859
Dec 14, 2010 at 7:40pm

@andrea agostini: The coll source is in the Max 4 SDK. I suspect it’s little different, if at all, in Max 5. I wonder what the comparative lookup times are for a database?

#193860
Dec 14, 2010 at 10:25pm

mmm… I should have a Max 4 SDK somewhere. But if it’s a pre-obex version, quite a lot of things could have changed. I’ll go and check it out.

I’d be curious too about the performance of a database. In principle, I’d expect it to be very efficient on huge collections of data, not as much for smaller ones.

aa

#193861
Dec 15, 2010 at 12:54am

I was using a coll because I can stash text, numbers, all kinds of things etc. in there. I’m not sure that jit.matrix can do this? (Although of course there are ways around that).

#193862

You must be logged in to reply to this topic.