t_hashtab clarification: dynamic resize?

Feb 25, 2008 at 1:13am

t_hashtab clarification: dynamic resize?

hashtab_new(n) returns a t_hashtab with n slots. What happens when those slots are full?

Is the size checked on every insert operation and dynamically resized (new larger hashtab, copy all keys, free old hashtab) when the hashtab is more than X% full? Or do I need to handle this myself?

#36028
Feb 25, 2008 at 8:21am

Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don’t need to worry about it.

Jeremy

#123373
Feb 25, 2008 at 9:18pm

Quote: Jeremy Bernstein wrote on Mon, 25 February 2008 21:21
—————————————————-
> Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

Thanks. So the number of index slots isn’t ever grown?

> In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don’t need to worry about it.

I could potentially be looking at storing anything from 3 to thousands of uniquely-named items in the hashtab. So at the higher end, if the number of index slots isn’t grown, I’ll be looking at a lot of index collisions (with corresponding performance penalty), right?

If it does matter, is there a better rule of thumb than (0) for specifying the init size based on what I think the likely scale of the hashtab could eventually become?

#123374
Feb 25, 2008 at 10:47pm

On Feb 25, 2008, at 1:18 PM, John Pitcairn wrote:

> Thanks. So the number of index slots isn’t ever grown?

No, and it’s not dynamically settable.

> I could potentially be looking at storing anything from 3 to
> thousands of uniquely-named items in the hashtab. So at the higher
> end, if the number of index slots isn’t grown, I’ll be looking at a
> lot of index collisions (with corresponding performance penalty),
> right?

You are correct. The more slots, the fewer collisions.

> If it does matter, is there a better rule of thumb than (0) for
> specifying the init size based on what I think the likely scale of
> the hashtab could eventually become?

Depends on your needs for memory/speed tradeoff, and how many entries
you expect. For example in the registered objects hashtab, we use a
hash of ~2000 slots, and for an individual object’s obex hashtab we
use a hashtab of 13 slots. In general I like to use a number of slots
that is one or two orders of magnitude smaller than the expected data
set (i.e. divided by 10-100). Make sure it’s a prime number for more
even hashing(i.e. less clumping).

-Joshua

#123375
Feb 26, 2008 at 12:31am

Thanks, just what I needed to know.

#123376

You must be logged in to reply to this topic.