Symbol table - how does it behave as number of symbols grow

dhjdhjdhj

I've implemented a system for doing chord recognition, using the cute trick I saw in an old discussion where a chord is converted to a symbol that can then be stored as a single entity in a coll.

I'm going to end up with thousands of these things because I want to be able to use arbitrary chords (with 1 to n notes) across an entire 88 note keyboard.

I'm assuming the symbol table is implemented as a hash table. If so, it presumably has a predefined max size (or can it grow?) and a way of handling collisions. So I'm wondering how big is the symbol table and how collisions are handled (linear search in a bucket would be horrible, for example).

Anyone know the big O function for this particular implementation as the number of symbols increases?

Thanks,
D

hz37

This may not be an answer to your question, but here it is anyway. I once learned from a much more experienced programmer than myself that if you have some sort of idea of what you are doing and a hint of the implementation of how to do that, you just move forward and start implementing it. Only if it turns out to be a bottleneck in terms of system performance you start optimising it, which might mean implementing a different way of using memory.

In your question, you already want to implement the feature with a coll object. Why not try it out? If you uzi a coll with 100,000 items, its retrieval performance is still fantastic in my opinion. You can default to writing and reading the coll contents to a file (as opposed to storing it in the patch), so you have the data in a nice and handy text file. Then see if this works for you. It might just do. If it doesn't, change the implementation.

There probably is a max size to coll, but I haven't come across it. I don't care beans if it's implemented as a hash. That's why I use max; it hides away all kinds of implementation details I don't care about. When I do care about those things, I use C++ or python or whatever the taste of the day is.

Good luck anyways, it sounds like an awesome idea!

dhjdhjdhj

Thanks for responding. Yes, we all know premature optimization is generally a bad thing. It's a "badder" thing if one is inexperienced which is why I always warn my students not to do it. (They still do though, and get into

--- EDIT
Don't know what happened here, but all the details I wrote of what I was actually doing did not seem to get posted.

Peter Castine

If you're recognizing octave equivalence (which is pretty standard), then you've only got 4,191 different pitch-class sets, which is a large, but still manageable, number of symbols.

If you're really going to try to recognize every possible combination of 1–n pitches as a different collection, you would potentially have (2^88)-1 items, although you'd never get around to creating all those items.

Going back to octave equivalence and a maximum of 4,191 chords, it's generally more efficient to encode them as integers, using a bit of bit manipulation. I could write a book about this (in fact, I have written a book about this), but there are also some threads here discussing the technique. If there's interest I'll try to dig them up next time I'm on the forum.

dhjdhjdhj

Thanks, Peter. I am NOT doing octave equivalence. (Well, I did that first to some extent as it's cute to be able to see chord names as I play although I was using 2 octave equivalence as, for example, the chord C3 E3 F3 G3 is quite different from the chord F2 C3 E3 G3, a chord with a 2nd is different than a chord with a 9th)

However, what I am actually doing now is capturing a solo performance (a piano piece for example), the notes of which which I'm currently storing in a coll. These performances will typically be around 5 to 10 minutes long. I'm using the absolute note numbers as keys (stored as symbols) with an integer as the value.

I then capture create multiple backing tracks, each of which is represented as a sequence of integer keys, with note numbers as the values. Those backing tracks might represent bass, guitars, drums in some cases, and any other orchestration or synth sounds I want. Then, as I play the solo performance back in real time, the chords I play will be recognized, producing integers that are then used to trigger the backing tracks. Won't work as well as Dannenberg's trumpet score following system but it will be good enough for what I need to do.

Certain chords might also be used to trigger patch changes and start some other events as well, so not just backing tracks. I would also be doing some playing tricks so that I could play in different areas of a keyboard which would sound the same but get mapped to completely different chord sequences. Further, I expect to use some tricks where (by pressing various pedals), I might add 128 or 256 to each note so that even though I'm playing in the same range, the "chords" would look completely different, thereby allowing me to have more variety with the backing tracks in real time.

It's basically working right now in some simple tests but as I commit more resources to it, I wanted to make sure that I'm not going to get surprised due to degradation in insertion/retrieval time as the symbol table grows. Hence my original question about the size of the symbol table and how collisions are handled.

I'm aware that I could encode differently (anyone remember Warren's great book "Hackers Delight" about doing all sorts of stuff with bit manipulation?) but I'd have to write an external and if I must go in that direction, I'd rather know now while I'm not under pressure and so have the time to do it.

I said most of this earlier but for some reason most of my previous response didn't get posted.....weird.

I had a copy of your book once (about the use of set theory in music if I remember correctly ) but it got lost when I left IBM in 1999. So definitely interested in any pointers (no pun intended) you have to offer.

Cheers

Lee

There's a thread somewhere around here from about 18 months ago where I was asking the same questions - the guys at C74 did some tests and said that even into multi-millions of entries in the table, there was no noticeable degradation - I promptly put it to one side ;)

btw, the specific question was to do with logging with timestamps doing to ms resolution and generating potentially 100s per s which would be creating a new entry in the symbol table every time...

dhjdhjdhj

Brilliant. That answers the question for me.

Thank you.