Search code examples
javascriptdata-structureshashhashtablev8

How to implement a robust hash table like v8


Looking to learn how to implement a hash table in a decent way in JavaScript.

I would like for it to be able to:

  1. Efficiently resolve collisions,
  2. Be space efficient, and
  3. Be unbounded in size (at least in principle, like v8 objects are, up to the size of the system memory).

From my research and help from SO, there are many ways to resolve collisions in hash tables. The way v8 does it is Quadratic probing:

The wikipedia algorithm implementing quadratic probing in JavaScript looks something like this:

var i = 0
var SIZE = 10000
var key = getKey(arbitraryString)
var hash = key % SIZE
if (hashtable[hash]) {
  while (i < SIZE) {
    i++
    hash = (key + i * i) % SIZE
    if (!hashtable[hash]) break
    if (i == SIZE) throw new Error('Hashtable full.')
  }
  hashtable[hash] = key
} else {
  hashtable[hash] = key
}

The elements that are missing from the wikipedia entry are:

  1. How to compute the hash getKey(arbitraryString). Hoping to learn how v8 does this (not necessarily an exact replica, just along the same lines). Not being proficient in C it looks like the key is an object, and the hash is a 32 bit integer. Not sure if the lookup-cache.h is important.
  2. How to make it dynamic so the SIZE constraint can be removed.
  3. Where to store the final hash, and how to compute it more than once.

V8 allows you to specify your own "Shape" object to use in the hash table:

// The hash table class is parameterized with a Shape.
// Shape must be a class with the following interface:
//   class ExampleShape {
//    public:
//     // Tells whether key matches other.
//     static bool IsMatch(Key key, Object* other);
//     // Returns the hash value for key.
//     static uint32_t Hash(Isolate* isolate, Key key);
//     // Returns the hash value for object.
//     static uint32_t HashForObject(Isolate* isolate, Object* object);
//     // Convert key to an object.
//     static inline Handle<Object> AsHandle(Isolate* isolate, Key key);
//     // The prefix size indicates number of elements in the beginning
//     // of the backing storage.
//     static const int kPrefixSize = ..;
//     // The Element size indicates number of elements per entry.
//     static const int kEntrySize = ..;
//     // Indicates whether IsMatch can deal with other being the_hole (a
//     // deleted entry).
//     static const bool kNeedsHoleCheck = ..;
//   };

But not sure what the key is and how they convert that key to the hash so keys are evenly distributed and the hash function isn't just a hello-world example.

The question is, how to implement a quick hash table like V8 that can efficiently resolve collisions and is unbounded in size. It doesn't have to be exactly like V8 but have the features outlined above.

In terms of space efficiency, a naive approach would do var array = new Array(10000), which would eat up a bunch of memory until it was filled out. Not sure how v8 handles it, but if you do var x = {} a bunch of times, it doesn't allocate a bunch of memory for unused keys, somehow it is dynamic.

I'm stuck here essentially:

var m = require('node-murmurhash')

function HashTable() {
  this.array = new Array(10000)
}

HashTable.prototype.key = function(value){
  // not sure if the key is actually this, or
  // the final result computed from the .set function,
  // and if so, how to store that.
  return m(value)
}

HashTable.prototype.set = function(value){
  var key = this.key(value)
  var array = this.array
  // not sure how to get rid of this constraint.
  var SIZE = 10000
  var hash = key % SIZE
  var i = 0
  if (array[hash]) {
    while (i < SIZE) {
      i++
      hash = (key + i * i) % SIZE
      if (!array[hash]) break
      if (i == SIZE) throw new Error('Hashtable full.')
    }
    array[hash] = value
  } else {
    array[hash] = value
  }
}

HashTable.prototype.get = function(index){
  return this.array[index]
}

Solution

  • This is a very broad question, and I'm not sure what exactly you want an answer to. ("How to implement ...?" sounds like you just want someone to do your work for you. Please be more specific.)

    How to compute the hash

    Any hash function will do. I've pointed out V8's implementation in the other question you've asked; but you really have a lot of freedom here.

    Not sure if the lookup-cache.h is important.

    Nope, it's unrelated.

    How to make it dynamic so the SIZE constraint can be removed.

    Store the table's current size as a variable, keep track of the number of elements in your hash table, and grow the table when the percentage of used slots exceeds a given threshold (you have a space-time tradeoff there: lower load factors like 50% give fewer collisions but use more memory, higher factors like 80% use less memory but hit more slow cases). I'd start with a capacity that's an estimate of "minimum number of entries you'll likely need", and grow in steps of 2x (e.g. 32 -> 64 -> 128 -> etc.).

    Where to store the final hash,

    That one's difficult: in JavaScript, you can't store additional properties on strings (or primitives in general). You could use a Map (or object) on the side, but if you're going to do that anyway, then you might as well use that as the hash table, and not bother implementing your own thing on top.

    and how to compute it more than once.

    That one's easy: invoke your hashing function again ;-)

    I just want a function getUniqueString(string)

    How about this:

    var table = new Map();
    var max = 0;
    
    function getUniqueString(string) {
      var unique = table.get(string);
      if (unique === undefined) {
        unique = (++max).toString();
        table.set(string, unique);
      }
      return unique;
    }
    

    For nicer encapsulation, you could define an object that has table and max as properties.