Search code examples
arraysrubyrangespace-efficiency

Efficiency of Arrays vs Ranges in Ruby


While working on something recently, I started to think about the efficiency of Arrays and Ranges in Ruby. I started to try and research this but could find very little information on it or even how I could test this myself.

So I came across some code that checks what range a HTTP status code is in, and it's written something like this

SUCCESS = (200...300)
REDIRECTION = (300...400)

if SUCCESS.include?(status_code)
  status = 'success'
elsif REDIRECTION.include?(status_code)
  status = 'redirection'
end

So this got me thinking that it seems wasteful to use 200...300 when we essentially only need 200...207, but would there a big efficiency difference in this, if any at all?

Also what about the 4XX codes, as it is not always a straight run of the range, it got me thinking that maybe I should turn it into an array, so I could write it one of two ways

As a straight range CLIENT_ERROR = (400...429)

or as an array CLIENT_ERROR = [*(400...419), 422, 429]

I'm assuming the first option is a better approach and more efficient but just not too sure how to validate my thoughts, so any advice or input on this would be greatly appreciated


Solution

  • TL;DR

    Ranges are generally faster and more memory-efficient than reifying Arrays. However, specific use cases may vary.

    If in doubt, benchmark. You can use irb's relatively new measure command, or use the Benchmark module to compare and contrast different approaches. In general, reifying a Range as an Array takes more memory and is slower than comparing against a Range (or even a small Array of Range objects), but unless you loop over this code a lot this seems like a premature optimization.

    Benchmarks

    Using Ruby 3.1.0, the Range approach is around 3,655.77% faster on my system. For example:

    require 'benchmark'
    
    n = 100_000
    
    Benchmark.bmbm do
      _1.report("Range") do
        n.times do
          client_error = [200..299, 400..499]
          client_error.include? 404
        end
      end
    
      _1.report("Array") do
        n.times do
          client_error = [*(200..299), *(400..499)]                                
          client_error.include? 404
        end
      end
    end
    
    Rehearsal -----------------------------------------
    Range   0.022570   0.000107   0.022677 (  0.022832)
    Array   0.707742   0.041499   0.749241 (  0.750012)
    -------------------------------- total: 0.771918sec
    
                user     system      total        real
    Range   0.020184   0.000043   0.020227 (  0.020245)
    Array   0.701911   0.037541   0.739452 (  0.740037)
    

    While the overall total times are better with Jruby and TruffleRuby, the performance differences between the approaches are only about 3-7x faster with Ranges. Meanwhile, Ruby 3.0.1 shows an approximate 37x speed improvement using a non-reified Range rather than an Array, so the Range approach is the clear winner here either way.

    Your specific values will vary based on system specs, system load, and Ruby version and engine. For smaller values of n, I can't imagine it will make any practical difference, but you should definitely benchmark against your own systems to determine if the juice is worth the squeeze.