gensym every output string?

Lee's icon

Hi, from my understanding, if I want to send a character string out of an outlet using the SDK then I have to gensym() it and that means it'll hang around forever?

Is that correct? thx

Peter Castine's icon

That's the deal.

The point of symbols is that you're passing around pointers, so comparison for equality has constant O (in technical terms: it is really, really fast). And that's practically the only thing the original Max did with symbols (run through a list of methods until you find the one that matches a symbol that's come in an inlet, then call that method). OTOH, testing for equality between strings is slow--at best I think it averages O(log(N)) and the worst case is O(N).

You can sort of get around this by using Jitter char matrices to represent strings if you really want to avoid using gensym().

Lee's icon

Yeah, thx. I understand the point, just thinking that when the symbol table get's big, it's quite an overhead to have to search it all the time just to send a string out of an outlet... if there's alot of messages coming through then it's alot of work to do..

don't know the data structure behind it, but i'm guessing it's not just going to be a list traversal, so maybe it's not much of an overhead anyway...

Lee's icon

i guess though, trying to look up your string in a table of 100000 strings and all those comparisons will take time anyway, unless they 1-way hash the strings or something of the like?

Timothy Place's icon

It's a hash table internally, so it should scale fairly well but symbol-table bloating is still a potential problem. You wouldn't want to gensym() a string for the current time to display for every second as an example.

Another way to pass strings without adding to the symbol table is to use dictionaries. For me, this would be a lot more convenient than using Jitter matrices, but your mileage may vary...

- .. --

Peter Castine's icon

FWIW, I gensym()'d the entire I Ching line-by-line without a noticeable performance hit (and that was back in G3 days). That was only the 1,000-symbol ballpark, with today's hardware I would guess that your 100,000 symbols would be possible. But I haven't tried stress-testing (and I don't know if anyone at Cycling has actually tried to push gensym() to its limits, either).

If you do generate enough symbols to notice a performance hit, please do let us know. Up to now I've only read general admonitions from various sources not to push gensym() too far, but no one's gotten specific about how far "too" is. Limits will obviously move as hardware continues to get faster, but curious minds still want to know.-)

Lee's icon

@Timothy, exactly what I'm doing with a timestamp :) Not looked into dictionaries yet...

@Peter, will add some stats and see what the numbers are in reality and get back.. still at the beginning of the C++ port, so may be a couple of weeks

Whatever the results tho, I'd still think the ability to throw a string out of an outlet without having to go through gensym() would be a desirable thing?

Lee's icon

oops, got cut of...

was going to say, that afterall, if you just want to output a string, the nicity of having fast comparison becomes immaterial, having to go through through a lookup for each string field becomes a performance hit???

Peter Castine's icon

Another FWIW: I whipped up a patch with a [button]->[uzi]->[random]->[sprintf symout %d] chain going into the right inlet of a message box. With uzi at 100,000 the response to the button was close to immediate. At 1,000,000, Max did beachball for two or three seconds.

So I think you ought to be able to timestamp every second for a couple of hours without a real performance hit. OTOH, there may be sporadic hits if/when the hash table has to adjust it's size or something. Look forward to hearing how things work for you. Mileage is bound to vary depending exactly on what anyone wants to do. And if this is an installation that you want to let run for years on end, things may look significantly different.

--
PS: Actually, I used [lp.tata] instead of [random] because it generates random derivates over the entire 32-bit range, but a [random 2000000000] is a probably an acceptable substitute.

Lee's icon

nice, thanks :)