johnpitcairn

hashtab_new(n) returns a t_hashtab with n slots. What happens when those slots are full?

Is the size checked on every insert operation and dynamically resized (new larger hashtab, copy all keys, free old hashtab) when the hashtab is more than X% full? Or do I need to handle this myself?

hashtab_new(n) returns a t_hashtab with n slots. What happens when those slots are full? 

Is the size checked on every insert operation and dynamically resized (new larger hashtab, copy all keys, free old hashtab) when the hashtab is more than X% full? Or do I need to handle this myself?


t_hashtab-clarification-dynamic-resize

Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don't need to worry about it.

Quote: Jeremy Bernstein wrote on Mon, 25 February 2008 21:21

----------------------------------------------------

> Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

Thanks. So the number of index slots isn't ever grown?

> In any case, hashtab_new(0) will return a hashtab with a default, good-for-most-purposes, number of slots (59, at the time of this writing), and you probably don't need to worry about it.

I could potentially be looking at storing anything from 3 to thousands of uniquely-named items in the hashtab. So at the higher end, if the number of index slots isn't grown, I'll be looking at a lot of index collisions (with corresponding performance penalty), right?

If it does matter, is there a better rule of thumb than (0) for specifying the init size based on what I think the likely scale of the hashtab could eventually become?

Quote: Jeremy Bernstein wrote on Mon, 25 February 2008 21:21
----------------------------------------------------
> Think of a hashtab as an array of size N, containing linked lists. The size N determines how many hash index possibilities there are, but the linked lists are grown as necessary, according to the data stored in the hash table.

If it does matter, is there a better rule of thumb than (0) for specifying the init size based on what I think the likely scale of the hashtab could eventually become?


On Feb 25, 2008, at 1:18 PM, John Pitcairn wrote:

> Thanks. So the number of index slots isn't ever grown?

> I could potentially be looking at storing anything from 3 to 

> thousands of uniquely-named items in the hashtab. So at the higher 

> end, if the number of index slots isn't grown, I'll be looking at a 

> lot of index collisions (with corresponding performance penalty), 

You are correct. The more slots, the fewer collisions.

> If it does matter, is there a better rule of thumb than (0) for 

> specifying the init size based on what I think the likely scale of 

Depends on your needs for memory/speed tradeoff, and how many entries 

you expect. For example in the registered objects hashtab, we use a 

hash of ~2000 slots, and for an individual object's obex hashtab we 

use a hashtab of 13 slots. In general I like to use a number of slots 

that is one or two orders of magnitude smaller than the expected data 

set (i.e. divided by 10-100). Make sure it's a prime number for more 

> I could potentially be looking at storing anything from 3 to  
> thousands of uniquely-named items in the hashtab. So at the higher  
> end, if the number of index slots isn't grown, I'll be looking at a  
> lot of index collisions (with corresponding performance penalty),  
> right?

> If it does matter, is there a better rule of thumb than (0) for  
> specifying the init size based on what I think the likely scale of  
> the hashtab could eventually become?

Depends on your needs for memory/speed tradeoff, and how many entries  
you expect. For example in the registered objects hashtab, we use a  
hash of ~2000 slots, and for an individual object's obex hashtab we  
use a hashtab of 13 slots. In general I like to use a number of slots  
that is one or two orders of magnitude smaller than the expected data  
set (i.e. divided by 10-100). Make sure it's a prime number for more  
even hashing(i.e. less clumping).

t_hashtab clarification: dynamic resize?