Search code examples
performancef#integer-overflow

F# Performance Impact of Checked Calcs?


Is there a performance impact from using the Checked module? I've tested it out with sequences of type int and see no noticeable difference. Sometimes the checked version is faster and sometimes unchecked is faster, but generally not by much.

Seq.initInfinite (fun x-> x) |> Seq.item 1000000000;;
Real: 00:00:05.272, CPU: 00:00:05.272, GC gen0: 0, gen1: 0, gen2: 0
val it : int = 1000000000
open Checked

Seq.initInfinite (fun x-> x) |> Seq.item 1000000000;;
Real: 00:00:04.785, CPU: 00:00:04.773, GC gen0: 0, gen1: 0, gen2: 0
val it : int = 1000000000

Basically I'm trying to figure out if there would be any downside to always opening Checked. (I encountered an overflow that wasn't immediately obvious, so I'm now playing the role of the jilted lover who doesn't want another broken heart.) The only non-contrived reason I can come up with for not always using Checked is if there were some performance hit, but I haven't seen one yet.


Solution

  • When you measure performance it's usually not a good idea to include Seq as Seq adds lots of overhead (at least compared to int operations) so you risk that most of the time is spent in Seq, not in the code you like to test.

    I wrote a small test program for (+):

    let clock = 
      let sw = System.Diagnostics.Stopwatch ()
      sw.Start ()
      fun () ->
        sw.ElapsedMilliseconds
    
    let dbreak () = System.Diagnostics.Debugger.Break ()
    
    let time a =
      let b = clock ()
      let r = a ()
      let n = clock ()
      let d = n - b
      d, r
    
    module Unchecked =
      let run c () =
        let rec loop a i =
          if i < c then
            loop (a + 1) (i + 1)
          else
            a
        loop 0 0
    
    module Checked =
      open Checked
    
      let run c () =
        let rec loop a i =
          if i < c then
            loop (a + 1) (i + 1)
          else
            a
        loop 0 0
    
    [<EntryPoint>]
    let main argv =
      let count     = 1000000000
      let testCases =
        [|
          "Unchecked" , Unchecked.run
          "Checked"   , Checked.run
        |]
    
      for nm, a in testCases do
        printfn "Running %s ..." nm
        let ms, r = time (a count)
        printfn "... it took %d ms, result is %A" ms r
    
      0
    

    The performance results are this:

    Running Unchecked ...
    ... it took 561 ms, result is 1000000000
    Running Checked ...
    ... it took 1103 ms, result is 1000000000
    

    So it seems some overhead is added by using Checked. The cost of int add should be less than the loop overhead so the overhead of Checked is higher than 2x maybe closer to 4x.

    Out of curiousity we can check the IL Code using tools like ILSpy:

    Unchecked:

        IL_0000: nop
        IL_0001: ldarg.2
        IL_0002: ldarg.0
        IL_0003: bge.s IL_0014
    
        IL_0005: ldarg.0
        IL_0006: ldarg.1
        IL_0007: ldc.i4.1
        IL_0008: add
        IL_0009: ldarg.2
        IL_000a: ldc.i4.1
        IL_000b: add
        IL_000c: starg.s i
        IL_000e: starg.s a
        IL_0010: starg.s c
        IL_0012: br.s IL_0000
    

    Checked:

        IL_0000: nop
        IL_0001: ldarg.2
        IL_0002: ldarg.0
        IL_0003: bge.s IL_0014
    
        IL_0005: ldarg.0
        IL_0006: ldarg.1
        IL_0007: ldc.i4.1
        IL_0008: add.ovf
        IL_0009: ldarg.2
        IL_000a: ldc.i4.1
        IL_000b: add.ovf
        IL_000c: starg.s i
        IL_000e: starg.s a
        IL_0010: starg.s c
        IL_0012: br.s IL_0000
    

    The only difference is that Unchecked uses add and Checked uses add.ovf. add.ovf is add with overflow check.

    We can dig even deeper by looking at the jitted x86_64 code.

    Unchecked:

    ; if i < c then
    00007FF926A611B3  cmp         esi,ebx  
    00007FF926A611B5  jge         00007FF926A611BD  
    ; i + 1
    00007FF926A611B7  inc         esi  
    ; a + 1
    00007FF926A611B9  inc         edi  
    ; loop (a + 1) (i + 1)
    00007FF926A611BB  jmp         00007FF926A611B3
    

    Checked:

    ; if i < c then
    00007FF926A62613  cmp         esi,ebx  
    00007FF926A62615  jge         00007FF926A62623  
    ; a + 1
    00007FF926A62617  add         edi,1  
    ; Overflow?
    00007FF926A6261A  jo          00007FF926A6262D  
    ; i + 1
    00007FF926A6261C  add         esi,1  
    ; Overflow?
    00007FF926A6261F  jo          00007FF926A6262D  
    ; loop (a + 1) (i + 1)
    00007FF926A62621  jmp         00007FF926A62613
    

    Now the reason for the Checked overhead is visible. After each operation the jitter inserts the conditional instruction jo which jumps to code that raises OverflowException if the overflow flag is set.

    This chart shows us that the cost of an integer add is less than 1 clock cycle. The reason it's less than 1 clock cycle is that modern CPU can execute certain instructions in parallel.

    The chart also shows us that branch that was correctly predicted by the CPU takes around 1-2 clock cycles.

    So assuming a throughtput of at least 2 the cost of two integer additions in the Unchecked example should be 1 clock cycle.

    In the Checked example we do add, jo, add, jo. Most likely CPU can't parallelize in this case and the cost of this should be around 4-6 clock cycles.

    Another interesting difference is that the order of additions changed. With checked additions the order of the operations matter but with unchecked the jitter (and the CPU) has a greater flexibility moving the operations possibly improving performance.

    So long story short; for cheap operations like (+) the overhead of Checked should be around 4x-6x compared to Unchecked.

    This assumes no overflow exception. The cost of a .NET exception is probably around 100,000x times more expensive than an integer addition.