Search code examples
performancecrystal-lang

Int32 vs Float64 performances in Crystal


I ran this benchmark and I was very surprised to see that Crystal performances are almost the same for Int32 or Float64 operations.

$ crystal benchmarks/int_vs_float.cr --release
  int32 414.96M (  2.41ns) (±14.81%)  0.0B/op        fastest
float64 354.27M (  2.82ns) (±12.46%)  0.0B/op   1.17× slower

Do I have some weird side effects on my benchmark code?

require "benchmark"

res = 0
res2 = 0.0

Benchmark.ips do |x|
  x.report("int32") do
    a = 128973 / 119236
    b = 119236 - 128973

    d = 117232 > 123462 ? 117232 * 123462 : 123462 / 117232

    res = a + b + d
  end

  x.report("float64") do
    a = 1.28973 / 1.19236
    b = 1.19236 - 1.28973

    d = 1.17232 > 1.23462 ? 1.17232 * 1.23462 : 1.23462 / 1.17232

    res = a + b + d
  end
end

puts res
puts res2


Solution

  • First of all / in Crystal is float division, so this is largely comparing floats:

    typeof(a) # => Float64
    typeof(b) # => Int32
    typeof(d) # => Float64 | Int32)
    

    If we fix the benchmark to use integer division, //, I get:

      int32 631.35M (  1.58ns) (± 5.53%)  0.0B/op   1.23× slower
    float64 773.57M (  1.29ns) (± 3.21%)  0.0B/op        fastest
    

    Still no real difference, within error margin. Why's that? Let's dig deeper. First we can extract the examples into a not inlinable function and make sure to call it so Crystal doesn't just ignore it:

    @[NoInline]
    def calc
      a = 128973 // 119236
      b = 119236 - 128973
      d = 117232 > 123462 ? 117232 * 123462 : 123462 // 117232
    
      a + b + d
    end
    p calc
    

    Then we can build this with crystal build --release --no-debug --emit llvm-ir to obtain an .ll file witht the optimized LLVM-IR. We dig out our calc function and see something like this:

    define i32 @"*calc:Int32"() local_unnamed_addr #19 {
    alloca:
      %0 = tail call i1 @llvm.expect.i1(i1 false, i1 false)
      br i1 %0, label %overflow, label %normal6
    
    overflow:                                         ; preds = %alloca
      tail call void @__crystal_raise_overflow()
      unreachable
    
    normal6:                                          ; preds = %alloca
      ret i32 -9735
    }
    

    Where's all our calculations gone? LLVM did them at compile time because it was all constants! We can repeat the experiment with the Float64 example:

    define double @"*calc:Float64"() local_unnamed_addr #11 {
    alloca:
      ret double 0x40004CAA3B35919C
    }
    

    A little less boilerplate, hence it being slightly faster, but again, all precomputed!

    I'll end the exercise here. Further research for the reader:

    • What happens if we try to introduce non constant terms into all expressions?
    • Is the premise that 32bit integer operations should be any faster or slower than 64bit IEEE754 floating point operations on a modern 64bit CPU sane?