Search code examples
arraysrubyoptimizationlanguage-lawyer

Why aren't empty arrays and hashes interned?


Recently I found out that Ruby doesn't optimize [] and {} to be interned to point to a common shared object. Demo:

irb(main):001:0> [].object_id
=> 70284401361960
irb(main):002:0> [].object_id
=> 70284392762340 # different
irb(main):003:0> [].object_id
=> 70284124310100 # different

irb(main):005:0> {}.object_id
=> 70284392857480
irb(main):006:0> {}.object_id
=> 70284392870480 # different
irb(main):007:0> {}.object_id
=> 70284392904360 # different

I understand that often empty hashes and array literals are used to initialize values that will be immediately mutated. But this happens even if you freeze them with [].freeze.object_id or {}.freeze.object_id instead.

Contrast this with String, when the env var RUBYOPT is set to --enable-frozen-string-literal:

irb(main):001:0> ""
=> 70284400947400
irb(main):002:0> ""
=> 70284400947400 # same
irb(main):003:0> ""
=> 70284400947400 # same

Even you don't enable frozen string literals, if you call "".freeze.object_id instead, you'll get the same object id each time, though I suspect that initial "" literal is still allocated an intermediate string object that freeze is being called on.

In a performance-sensitive codebase (well, as performance-sensitive you can allow yourself to be while still using MRI lol) I've seen this workaround, which uses the following shared constants instead of [] or {} for cases when the hash or array doesn't need to be mutable:

module LessAllocations
  EMPTY_HASH = {}.freeze
  EMPTY_ARRAY = [].freeze

  # String literals are already frozen, use '' instead
  # EMPTY_STRING = ''
end

So my questions are:

  1. Is this a missed optimization opportunity?

    It seems like a peephole optimization could be written to intern frozen empty hash or array literals. Would need to be part of the language's semantics, i.e. would there be any observable differences, (obviously besides the behavior of object_id)?

  2. Are empty hash and array literals represented by tagged pointers? Do they even cause any allocation to happen?


Solution

  • Is this a missed optimization opportunity?

    To answer this question one would to have to do a survey of real world memory usage, but I believe it is unlikely. You could perform a little experiment...

    class Array
      EMPTY_ARRAY = [].freeze
      
      def freeze
        empty? ? EMPTY_ARRAY : super
      end
    end
    

    Empty objects are very small. Having so many that they use significant memory compared to everything else your program is using memory for is an edge case.

    I understand that often empty hashes and array literals are used to initialize values that will be immediately mutated.

    For that reason, adding copy-on-write to empty hashes and arrays might slow things down.

    But this happens even if you do [].freeze.object_id or {}.freeze.object_id instead.

    Freezing them means you know ahead of time they will remain empty, that's extremely rare. Having so many known empty hashes and arrays that it becomes a performance issue is an edge case. The constant work-around seems fine.