Forums > Dev

t_hashtab clarification: dynamic resize?

February 25, 2008 | 1:13 am

hashtab_new(n) returns a t_hashtab with n slots. What happens when those slots are full?

Is the size checked on every insert operation and dynamically resized (new larger hashtab, copy all keys, free old hashtab) when the hashtab is more than X% full? Or do I need to handle this myself?


February 25, 2008 | 8:21 am

Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don’t need to worry about it.

Jeremy


February 25, 2008 | 9:18 pm

Quote: Jeremy Bernstein wrote on Mon, 25 February 2008 21:21
—————————————————-
> Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

Thanks. So the number of index slots isn’t ever grown?

> In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don’t need to worry about it.

I could potentially be looking at storing anything from 3 to thousands of uniquely-named items in the hashtab. So at the higher end, if the number of index slots isn’t grown, I’ll be looking at a lot of index collisions (with corresponding performance penalty), right?

If it does matter, is there a better rule of thumb than (0) for specifying the init size based on what I think the likely scale of the hashtab could eventually become?


February 25, 2008 | 10:47 pm

On Feb 25, 2008, at 1:18 PM, John Pitcairn wrote:

> Thanks. So the number of index slots isn’t ever grown?

No, and it’s not dynamically settable.

> I could potentially be looking at storing anything from 3 to
> thousands of uniquely-named items in the hashtab. So at the higher
> end, if the number of index slots isn’t grown, I’ll be looking at a
> lot of index collisions (with corresponding performance penalty),
> right?

You are correct. The more slots, the fewer collisions.

> If it does matter, is there a better rule of thumb than (0) for
> specifying the init size based on what I think the likely scale of
> the hashtab could eventually become?

Depends on your needs for memory/speed tradeoff, and how many entries
you expect. For example in the registered objects hashtab, we use a
hash of ~2000 slots, and for an individual object’s obex hashtab we
use a hashtab of 13 slots. In general I like to use a number of slots
that is one or two orders of magnitude smaller than the expected data
set (i.e. divided by 10-100). Make sure it’s a prime number for more
even hashing(i.e. less clumping).

-Joshua


February 26, 2008 | 12:31 am

Thanks, just what I needed to know.


Viewing 5 posts - 1 through 5 (of 5 total)