Search code examples
fortranintel-fortran

Optimization defeats robustness measures


I am currently investigating robust methods for the summation of arrays, and implemented the algorithm published by Shewchuk in "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". While the implemented algorithm works as expected in gfortran, ifort optimizes the countermeasures away.

To give some context, here is my code:

module test_mod
contains
  function shewchukSum( array ) result(res)
    implicit none
    real,intent(in) :: array(:)
    real            :: res
    integer         :: xIdx, yIdx, i, nPartials
    real            :: partials(100), hi, lo, x, y

    nPartials = 0
    do xIdx=1,size(array)
      i = 0
      x = array(xIdx)

      ! Calculate the partial sums
      do yIdx=1,nPartials
        y = partials(yIdx)
        hi = x + y
        if ( abs(x) < abs(y) ) then
          lo = x - (hi - y)
        else
          lo = y - (hi - x)
        endif
        x = hi

        ! If a round-off error occured, store it. Exact comparison intended
        if ( lo == 0. ) cycle
        i = i + 1 ; partials(i) = lo
      enddo ! yIdx
      nPartials = i + 1 ; partials( nPartials ) = x
    enddo ! xIdx

    res = sum( partials(:nPartials) )
  end function
end module

And the calling test program is

program test
  use test_mod
  implicit none
  print *,        sum([1.e0, 1.e16, 1.e0, -1.e16])
  print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16])
end program

Compilation using gfortran with produces the correct results for all optimization levels:

./a.out 
   0.00000000    
   2.00000000   

ifort, however, produces zeros for all optimizations above -O0:

./a.out 
   0.00000000
   0.00000000

I tried to debug the code and went down to the assembly level and figured out that ifort is optimizing away the calculation of lo and the operations after if ( lo == 0. ) cycle .

Is there a possibility to force ifort to perform the complete operation for all levels of optimization? This addition is a critical part of the calculations, and I want it to run as fast as possible. For comparison, gfortran at -O2 executes this code approximately eight to ten times faster than ifort at -O0 (measured for arrays of length >100k).


Solution

  • When it comes to floating point operations, the default for ifort is generally for performance rather than strict correctness.

    There are a number of options to control the floating point behaviour. Using ifort 16 and the option -assume protect_parens I get the expected behaviour even at higher optimization levels.

    Additionally, there are the options -fp-model precise -fp-model source (this latter implies -assume protect_parens) which may also be of interest to you. The default for -fp-model is fast=1 which

    allows value-unsafe optimizations

    Naturally, these may have an impact on performance, so other options around the floating point behaviour are also worth considering.

    Much further detail can be found in an Intel publication.