Search code examples
chashhashtable

Hashtable separate chaining in C


I’m building a hashtable in C using open hashing (separate chaining) to store words. I’m unconcerned by the order in which I will store words with the same hash key.

Currently, I have a pointer to a struct (struct dict * d) with my hashtable (struct item * arr). More specifically, this table is an array of items (struct item) containing a word (char * word) and a pointer (struct item * next).

I’m unclear about two aspects:

1. When chaining words together after collision (inserting new item), should I insert the element at the beginning or at the end of the linked list?

I’ve seen it done both ways, but the latter seems more popular. However, the former seems quicker to me as I only need to set the pointer of my first item to my new item, and its pointer to null. I don’t have to do any pointer chasing (i.e. travel through my linked list until I find the null pointer).

2. Should my hashtable be an array of pointers to items (struct item), or simply an array of items (struct item), as I have done?

In other words, should the very first item for a specific hash key be inserted in the first cell (an empty cell), or should there already be a pointer in that cell which we will point to this new item?


Solution

  • For 1. it really shouldn’t matter if you prepend or append to the list. If you keep the load small, the chains are short, and you shouldn’t see any noticeable difference in access performance. If you keep the table small and the load gets high, you might want to look into different strategies. The access pattern might matter then. For example, if you are more likely to look up recently inserted values, you want them early in the list, so then it is better to prepend. But with a hash table, it is better to keep the load small if you can, and then it shouldn’t matter.

    For 2. either will also work. If your table is an array of pointers, NULL for empty chains, a simple recursive linked list implementation will work nicely. Make your list functions take a list as an argument and make insert and delete return a new list. Either argument or return value can be NULL. Then do something like tbl[bin] = insert(tbl[bin], val) or tbl[bin] = delete(tbl[bin], val). If the chains are short, you don’t have to worry too much about recursion overhead. In any case, you don’t need recursion for looking up a value or inserting if it is just prepending, so it is only delete where you don’t get tail recursion anyway. The benefit you get from having an array of links is either that you get a dummy element at the front of the list, which simplifies non-recursive list implementations by avoiding special cases for empty lists, or you avoid following a pointer to access the first element in the chain once you look up the bin. For the latter, you need a way to distinguish an empty chain from a chain with one element, though. It is hardly worth it, and if you want to avoid jumping along linked lists, open addressing or some other collision strategy might be better.