Search code examples
hashhashtablelinear-probingdouble-hashing

Double Hashing vs Linear Hashing


I'm writing double hash table which only takes integer.

unsigned int DoubleHashTable::HashFunction1(unsigned int const data)
{
   return (data % GetTableSize());
}

unsigned int DoubleHashTable::HashFunction2(unsigned int const data, unsigned int count)
{
   return ((HashFunction1(data) + count * (5 - (data % 5)) % GetTableSize()));
}

and trying to insert data into table with SetData()

void DoubleHashTable::SetData(unsigned int const data)
{
   unsigned int probe = HashFunction1(data);

   if (m_table[probe].GetStatus())
   {
      unsigned int count = 1;
      while (m_table[probe].GetStatus() && count <= GetTableSize())
      {
         probe = HashFunction2(data, count);
         count++;
      }
   }

   m_table[probe].Insert(data);
}

After put 100 of integer items into table with size of 100, table shows me some of indexes are left as blank. I know, it will takes O(N) which is worst case. My question is, item should be inserted into table with no empty space even it takes worst case of search time, right? I can't find the problem of my functions.

Additional question. There are well-known algorithms for hash and purpose of double hashing is makes less collision as much as possible, H2(T) is backup for H1(T). But, if well-known hashing algorithm (like MD5, SHA and other, I'm not talking about security, just well-known algorithm) is faster and well-distribute, why we need a double hashing?

Thanks!


Solution

  • When testing hash functions, there may be high collisions with certain pathological inputs (=those which break your hash function). These inputs can be discovered by reversing the hash function which can lead to certain attacks (this is a real concern as internet routers have limited space for hash tables). Even with no adversary, the look up time of such a hash table after certain inputs can grow and even become linear in the worst case.

    Double hashing is a method of resolving hash collisions to try to solve the problem of linear growth on pathological inputs. Linear probing or open addressing are popular choices. However, the number of inputs must be much lower than the table size in these cases, unless your hash table can grow dynamically.

    To answer your second question (now that you have fixed your code on your own), in a nutshell, double hashing is better-suited for small hash tables, and single hashing is better-suited for large hash tables.