While working on something recently, I started to think about the efficiency of Arrays and Ranges in Ruby. I started to try and research this but could find very little information on it or even how I could test this myself.
So I came across some code that checks what range a HTTP status code is in, and it's written something like this
SUCCESS = (200...300)
REDIRECTION = (300...400)
if SUCCESS.include?(status_code)
status = 'success'
elsif REDIRECTION.include?(status_code)
status = 'redirection'
end
So this got me thinking that it seems wasteful to use 200...300 when we essentially only need 200...207, but would there a big efficiency difference in this, if any at all?
Also what about the 4XX codes, as it is not always a straight run of the range, it got me thinking that maybe I should turn it into an array, so I could write it one of two ways
As a straight range
CLIENT_ERROR = (400...429)
or as an array
CLIENT_ERROR = [*(400...419), 422, 429]
I'm assuming the first option is a better approach and more efficient but just not too sure how to validate my thoughts, so any advice or input on this would be greatly appreciated
Ranges are generally faster and more memory-efficient than reifying Arrays. However, specific use cases may vary.
If in doubt, benchmark. You can use irb's relatively new measure command, or use the Benchmark module to compare and contrast different approaches. In general, reifying a Range as an Array takes more memory and is slower than comparing against a Range (or even a small Array of Range objects), but unless you loop over this code a lot this seems like a premature optimization.
Using Ruby 3.1.0, the Range approach is around 3,655.77% faster on my system. For example:
require 'benchmark'
n = 100_000
Benchmark.bmbm do
_1.report("Range") do
n.times do
client_error = [200..299, 400..499]
client_error.include? 404
end
end
_1.report("Array") do
n.times do
client_error = [*(200..299), *(400..499)]
client_error.include? 404
end
end
end
Rehearsal -----------------------------------------
Range 0.022570 0.000107 0.022677 ( 0.022832)
Array 0.707742 0.041499 0.749241 ( 0.750012)
-------------------------------- total: 0.771918sec
user system total real
Range 0.020184 0.000043 0.020227 ( 0.020245)
Array 0.701911 0.037541 0.739452 ( 0.740037)
While the overall total times are better with Jruby and TruffleRuby, the performance differences between the approaches are only about 3-7x faster with Ranges. Meanwhile, Ruby 3.0.1 shows an approximate 37x speed improvement using a non-reified Range rather than an Array, so the Range approach is the clear winner here either way.
Your specific values will vary based on system specs, system load, and Ruby version and engine. For smaller values of n, I can't imagine it will make any practical difference, but you should definitely benchmark against your own systems to determine if the juice is worth the squeeze.