Recently I found out that Ruby doesn't optimize []
and {}
to be interned to point to a common shared object. Demo:
irb(main):001:0> [].object_id
=> 70284401361960
irb(main):002:0> [].object_id
=> 70284392762340 # different
irb(main):003:0> [].object_id
=> 70284124310100 # different
irb(main):005:0> {}.object_id
=> 70284392857480
irb(main):006:0> {}.object_id
=> 70284392870480 # different
irb(main):007:0> {}.object_id
=> 70284392904360 # different
I understand that often empty hashes and array literals are used to initialize values that will be immediately mutated. But this happens even if you freeze them with [].freeze.object_id
or {}.freeze.object_id
instead.
Contrast this with String
, when the env var RUBYOPT
is set to --enable-frozen-string-literal
:
irb(main):001:0> ""
=> 70284400947400
irb(main):002:0> ""
=> 70284400947400 # same
irb(main):003:0> ""
=> 70284400947400 # same
Even you don't enable frozen string literals, if you call "".freeze.object_id
instead, you'll get the same object id each time, though I suspect that initial ""
literal is still allocated an intermediate string object that freeze
is being called on.
In a performance-sensitive codebase (well, as performance-sensitive you can allow yourself to be while still using MRI lol) I've seen this workaround, which uses the following shared constants instead of []
or {}
for cases when the hash or array doesn't need to be mutable:
module LessAllocations
EMPTY_HASH = {}.freeze
EMPTY_ARRAY = [].freeze
# String literals are already frozen, use '' instead
# EMPTY_STRING = ''
end
So my questions are:
Is this a missed optimization opportunity?
It seems like a peephole optimization could be written to intern frozen empty hash or array literals. Would need to be part of the language's semantics, i.e. would there be any observable differences, (obviously besides the behavior of object_id
)?
Are empty hash and array literals represented by tagged pointers? Do they even cause any allocation to happen?
Is this a missed optimization opportunity?
To answer this question one would to have to do a survey of real world memory usage, but I believe it is unlikely. You could perform a little experiment...
class Array
EMPTY_ARRAY = [].freeze
def freeze
empty? ? EMPTY_ARRAY : super
end
end
Empty objects are very small. Having so many that they use significant memory compared to everything else your program is using memory for is an edge case.
I understand that often empty hashes and array literals are used to initialize values that will be immediately mutated.
For that reason, adding copy-on-write to empty hashes and arrays might slow things down.
But this happens even if you do [].freeze.object_id or {}.freeze.object_id instead.
Freezing them means you know ahead of time they will remain empty, that's extremely rare. Having so many known empty hashes and arrays that it becomes a performance issue is an edge case. The constant work-around seems fine.