Unexpected access performance differences between arrays and hashes

I have evaluated access times for a two-dimensional array, implemented as

an array of arrays
a hash of arrays
a hash with arrays as keys

My expectation was to see similar acess times for all 3. I also have expected the measurements to yield similar results for MRI and JRuby.

However, for a reason that I do not understand, on MRI accessing elements within an array of arrays or within a hash of arrays is an order of magnitude faster than accessing elements of a hash.

On JRuby, instead of being 10 times as expensive, hash access was about 50 times as expensive as with an array of arrays.

The results:

MRI (2.1.1):

                         user     system      total        real
hash of arrays:        1.300000   0.000000   1.300000 (  1.302235)
array of arrays:       0.890000   0.000000   0.890000 (  0.896942)
flat hash:            16.830000   0.000000  16.830000 ( 16.861716)

JRuby (1.7.5):

                           user     system      total        real
hash of arrays:        0.280000   0.000000   0.280000 (  0.265000)
array of arrays:       0.250000   0.000000   0.250000 (  0.182000)
flat hash:            77.450000   0.240000  77.690000 ( 75.156000)

Here are two of my benchmarks:

ary = (0...n).map { Array.new(n, 1) }

bm.report('array of arrays:') do
  iterations.times do
    (0...n).each { |x|
      (0...n).each { |y|
        v = ary[x][y]
      }
    }
  end
end

hash = {}
(0...n).each { |x|
  (0...n).each { |y|
    hash[[x, y]] = 1
  }
}

prepared_indices = (0...n).each_with_object([]) { |x, ind|
  (0...n).each { |y|
    ind << [x, y]
  }
}

bm.report('flat hash:') do
  iterations.times do
    prepared_indices.each { |i|
      v = hash[i]
    }
  end
end

All container elements are initialized with a numeric value and have the same total number of elements. The arrays for accessing the hash are preinitialized in order to benchmark the element access only.

Here is the complete code

I have consulted this thread and this article but still have no clue about the unexpected performance differences.

Why are the results so different from my expectations? What am I missing?

Solution

Consider the memory layout of an array of arrays, say with dimensions 3x3... you've got something like this:

memory address       usage/content
base                 [0][0]
base+sizeof(int)     [0][1]
base+2*sizeof(int)   [0][2]
base+3*sizeof(int)   [1][0]
base+4*sizeof(int)   [1][1]
...

Given an array of dimensions [M][N], all that's needed to access an element in at indices [i][j] is to add the base memory address to the data element size times (i * M + j)... a tiny bit of simple arithmetic, and therefore extremely fast.

Hashes are far more complicated data structures and inherently slower. With a hash, you need to take time to hash the key (and the harder the hash tries to make sure different keys will - statistically - scatter pretty randomly throughout the hash output range even if they're similar keys the slower the hash tends to be, if the hash function doesn't make that effort you'll have more collisions in the hash table and slower performance there), then the hash value needs to be mapped on to the current hash table size (usually using "%"), then you need to compare keys to see if you've found the hoped-for key or a colliding element or an empty element. It's a far more involved process than array indexing. You should probably do some background reading about hash function and hash table implementations....

The reason hashes are so often useful is that the key doesn't need to be numeric (you can always work out some formula to generate a number from arbitrary key data) and need not be near-contiguous for memory efficiency (i.e. a hash table with say memory capacity for 5 integers could happily store keys 1, 1000 and 12398239 - whereas for an array keyed on those values there would be a lot of virtual address space wasted for all the indices in between, which have no data anyway, and anyway more data packed into a memory page means more cache hits).

Further - you should be careful with benchmarks - when you do clearly repetitive work with unchanging values overwriting the same variable, an optimiser may avoid it and you may not be timing what you think you are. It's good to use some run-time inputs (e.g. storing different values in the containers) and accumulate some dependent result (e.g. summing the element accesses rather than overwriting it), then outputting the result so any lazy evaluation is forced to conclude. With things like JITs and VMs involved there can also be kinks in your benchmarks as compilation kicks in or branch prediction results are incorporated.