Search code examples
hashvowpalwabbithash-collision

Can two features in two different namspaces collide while hashing in vowpal wabbit?


Do all the namespaces have the same hash space or does each namespace have an independent hash space? Also, if they do have independent hash spaces, do they create new hash spaces for interaction features?


Solution

  • vw uses only one global hash-space. The size of this space is 2^b where b is the number of hash bits. By default b is 18, and it can be changed by passing -b <bits> argument to vw.

    So the answer to the 1st question is Yes, there's only one, common hash-space, and there can be collisions.

    Name spaces only change the starting point in the calculation of the hash, features in two separate name spaces can still collide.

    The hash function is basically:

    hash_func(string)

    where the string that's being passed to the hash_func is "<namespace>^<feature_name>"

    It is easy to check if your -b <bits> argument is too small: if by increasing -b ... you get a dramatically lower loss, then it is likely you had (many) collisions in the lower value setting.