Search code examples
javascripthashstring-hashing

Hash distribution, why is 0 always heavily weighted?


I wrote a quick canvas visualization to see the distribution of a hashing algorithm that I ported from C++ to JavaScript.

I'm seeing odd behavior in that no matter what I mod the hash by, 0 is heavily biased, in that it is selected exactly twice as often than most other numbers from the hashing function.

You can see the demo at: http://jsfiddle.net/x5L73/2/

and the original C++ algorithm: http://www.azillionmonkeys.com/qed/hash.html

And the part of the code I'm referring to, which is at the bottom of the jsFiddle:

// hash is 0 twice as often as anything else
var hash = app.Hash( word ) % ( 3499 )
  ,   b1 = 0|hash / 59
  ,   b2 =   hash % 59;

What is odd to me is that hash is zero twice as often as it is any other value, regardless of what I choose to mod it by. In this example, it is zero 1/3499 times, whereas any other number is hit 1/6998 times. This was determined through brute force testing via:

if( hash!==1234 ){ nonZero++; }else{ zero++ } // 1234 is a random number to check       
if( Math.random() < .00001 ){ console.log( zero, nonZero, 0|nonZero/zero ); }

What am I missing here???


Solution

  • Although this is a very fun fact that might come in handy wwhen dealing with Integers, just like hashes, the error is not due to the fact that JavaScript has a negative zero too...

    The original cause, as reported by OP was:

    It's because I was accidentally discarding all negative numbers in the visualization.

    Yep, negative numbers are not that trivial, our minds tend to ignore them sometimes - especially when they are focusing on a specific, difficult problem involving integers for a long time, just like trying to figure out good hashing methods, then switching to a seemingly lot easier task: displaying the results...

    So the real answer is: besides a negative zero, JavaScript has quite some more negative numbers too... Don't forget to take them into count - even in the easy task of visualization.

    TL;DR

    I leave this here, as this might come in handy for anyone having similar issues in the future, as it might cause a similar scenario.

    Look at this question: +0 and -0 in JavaScript (negative zero and positive zero in JavaScript)

    Quote:

    JavaScript uses IEEE 745 standard to represent numbers. From Wikipedia:

    Signed zero is zero with an associated sign. In ordinary arithmetic, −0 = +0 = 0. However, in computing, some number representations allow for the existence of two zeros, often denoted by −0 (negative zero) and +0 (positive zero). This occurs in some signed number representations for integers, and in most floating point number representations. The number 0 is usually encoded as +0, but can be represented by either +0 or −0.

    The IEEE 754 standard for floating point arithmetic (presently used by most computers and programming languages that support floating point numbers) requires both +0 and −0. The zeroes can be considered as a variant of the extended real number line such that 1/−0 = −∞ and 1/+0 = +∞, division by zero is only undefined for ±0/±0 and ±∞/±∞.