Int32 vs Float64 performances in Crystal

I ran this benchmark and I was very surprised to see that Crystal performances are almost the same for Int32 or Float64 operations.

$ crystal benchmarks/int_vs_float.cr --release
  int32 414.96M (  2.41ns) (±14.81%)  0.0B/op        fastest
float64 354.27M (  2.82ns) (±12.46%)  0.0B/op   1.17× slower

Do I have some weird side effects on my benchmark code?

require "benchmark"

res = 0
res2 = 0.0

Benchmark.ips do |x|
  x.report("int32") do
    a = 128973 / 119236
    b = 119236 - 128973

    d = 117232 > 123462 ? 117232 * 123462 : 123462 / 117232

    res = a + b + d
  end

  x.report("float64") do
    a = 1.28973 / 1.19236
    b = 1.19236 - 1.28973

    d = 1.17232 > 1.23462 ? 1.17232 * 1.23462 : 1.23462 / 1.17232

    res = a + b + d
  end
end

puts res
puts res2

Solution

First of all / in Crystal is float division, so this is largely comparing floats:

typeof(a) # => Float64
typeof(b) # => Int32
typeof(d) # => Float64 | Int32)

If we fix the benchmark to use integer division, //, I get:

  int32 631.35M (  1.58ns) (± 5.53%)  0.0B/op   1.23× slower
float64 773.57M (  1.29ns) (± 3.21%)  0.0B/op        fastest

Still no real difference, within error margin. Why's that? Let's dig deeper. First we can extract the examples into a not inlinable function and make sure to call it so Crystal doesn't just ignore it:

@[NoInline]
def calc
  a = 128973 // 119236
  b = 119236 - 128973
  d = 117232 > 123462 ? 117232 * 123462 : 123462 // 117232

  a + b + d
end
p calc

Then we can build this with crystal build --release --no-debug --emit llvm-ir to obtain an .ll file witht the optimized LLVM-IR. We dig out our calc function and see something like this:

define i32 @"*calc:Int32"() local_unnamed_addr #19 {
alloca:
  %0 = tail call i1 @llvm.expect.i1(i1 false, i1 false)
  br i1 %0, label %overflow, label %normal6

overflow:                                         ; preds = %alloca
  tail call void @__crystal_raise_overflow()
  unreachable

normal6:                                          ; preds = %alloca
  ret i32 -9735
}

Where's all our calculations gone? LLVM did them at compile time because it was all constants! We can repeat the experiment with the Float64 example:

define double @"*calc:Float64"() local_unnamed_addr #11 {
alloca:
  ret double 0x40004CAA3B35919C
}

A little less boilerplate, hence it being slightly faster, but again, all precomputed!

I'll end the exercise here. Further research for the reader:

What happens if we try to introduce non constant terms into all expressions?
Is the premise that 32bit integer operations should be any faster or slower than 64bit IEEE754 floating point operations on a modern 64bit CPU sane?