Search code examples
algorithmlinked-listtime-complexityhashtablepseudocode

Hash table average complexity of functions


GOAL

The goal is to use a chained hash table, with a hashing function h that satisfies the Simple Uniform Hashing Assumption, to store some numbers (the same number, if inserted multiple times, is stored multiple times in the linked list associated with h(n)). We have 3 functions to implement: one that inserts a new number n and counts the number of occurrences of the number, one that deletes a number and counts the number of occurrences of the number and one that just counts the number of occurrences of the number.

NOTES

We have to use pseudocode (similar to python) and can use method ".head" that returns the head of the linked list and the functions LIST_PREPEND, LIST_SEARCH that prepend an item (given as a pointer) and search a key (returning the associated pointer) in the linked list. We can also use CHAINED_HASH_DELETE which deletes an item (given as a pointer) from the linked list associated to it. Note that the variable x always denotes a pointer to an item in one of the linked lists of the hash table, n is always an integer, which we use as key, and M is the chained hash table. The syntax and ""programming"" really don't matter, what matters is the algorithm.

MY CODE

COUNT(M, n): 
    count = 0
    x = M[h(n)].head
    if x == NIL: #the linked list is empty
        return 0
    else:
        while x != NIL: #search the list until its end
            if x.key == n:
                count = count + 1
            x = x.next
        return count

DELETE(M, n)
    x = LIST_SEARCH(M[h(n)], n)
    CHAINED_HASH_DELETE(M, x)
    COUNT(M, n)

INSERT(M, n):
    <let x be a pointer to n>
    LIST_PREPEND(M[h(n)], x)
    COUNT(M, n)

MY QUESTION

  1. The idea behind my code is probably a little bit inefficient because every time I have to re-count the number of occurrences. But, the average complexity of all the functions (COUNT, INSERT, DELETE) the should anyway be O(1+a) where a is the load factor, because LIST_SEARCH and COUNT have complexity O(1+a). Is the average complexity correct? Are the algorithms themselves correct?

  2. Another solution was to use an attribute .counter on the first item of the linked list in M[h(n)] and add 1 or subtract 1 each time I insert or delete an item. Since the resulting complexities are also O(1+a) because I have to find the first item equal to n, it shouldn't matter a lot in a theoretical scenario, correct?


Solution

  • Hashing is inherently deterministic. However, we can use probabilistic analysis to understand the average-case performance of hash table operations under the assumption of a uniform distribution of keys.

    Denoting the load factor by α=N/M where N is the number of entries in the hash table, M is the number of buckets. Under SUHA, each key is equally likely to hash to any bucket. Therefore, we expect the keys to be evenly distributed across the buckets. On average, there will be α entries per bucket, thus leading to average time complexity of lookups, insertion and deletion being O(1 + α). You might find this post, Simple Uniform Hashing Assumption and Worst-Case Complexity for Hash Tables insightful, which provides different takes on SUHA.

    However, this result implies that if the size M of the hash table is at least proportional to the number N of elements in the table, we have N = O(M). Hence, α = N/M = O(M)/M = O(1), and a lookup takes O(1) on average. For O(1) performance on lookups, α should therefore be bounded above by a small constant, preferably below 1.

    If α grows beyond an acceptable limit, we may choose to rehash the hash table. This involves enlarging the size of the table and re-inserting the existing entries into the new, larger table. Since M becomes larger relative to N, the load factor α decreases accordingly.

    For your second question, you're able to keep the cost of your count function to O(1), so do just that. By using a .counter attribute on the first item of the linked list, you can add 1 or subtract 1 each time you insert or delete an item.