I am currently investigating robust methods for the summation of arrays, and implemented the algorithm published by Shewchuk in "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". While the implemented algorithm works as expected in gfortran
, ifort
optimizes the countermeasures away.
To give some context, here is my code:
module test_mod
contains
function shewchukSum( array ) result(res)
implicit none
real,intent(in) :: array(:)
real :: res
integer :: xIdx, yIdx, i, nPartials
real :: partials(100), hi, lo, x, y
nPartials = 0
do xIdx=1,size(array)
i = 0
x = array(xIdx)
! Calculate the partial sums
do yIdx=1,nPartials
y = partials(yIdx)
hi = x + y
if ( abs(x) < abs(y) ) then
lo = x - (hi - y)
else
lo = y - (hi - x)
endif
x = hi
! If a round-off error occured, store it. Exact comparison intended
if ( lo == 0. ) cycle
i = i + 1 ; partials(i) = lo
enddo ! yIdx
nPartials = i + 1 ; partials( nPartials ) = x
enddo ! xIdx
res = sum( partials(:nPartials) )
end function
end module
And the calling test program is
program test
use test_mod
implicit none
print *, sum([1.e0, 1.e16, 1.e0, -1.e16])
print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16])
end program
Compilation using gfortran
with produces the correct results for all optimization levels:
./a.out
0.00000000
2.00000000
ifort
, however, produces zeros for all optimizations above -O0
:
./a.out
0.00000000
0.00000000
I tried to debug the code and went down to the assembly level and figured out that ifort
is optimizing away the calculation of lo
and the operations after if ( lo == 0. ) cycle
.
Is there a possibility to force ifort
to perform the complete operation for all levels of optimization? This addition is a critical part of the calculations, and I want it to run as fast as possible.
For comparison, gfortran
at -O2
executes this code approximately eight to ten times faster than ifort
at -O0
(measured for arrays of length >100k).
When it comes to floating point operations, the default for ifort is generally for performance rather than strict correctness.
There are a number of options to control the floating point behaviour. Using ifort 16 and the option -assume protect_parens
I get the expected behaviour even at higher optimization levels.
Additionally, there are the options -fp-model precise -fp-model source
(this latter implies -assume protect_parens
) which may also be of interest to you. The default for -fp-model
is fast=1
which
allows value-unsafe optimizations
Naturally, these may have an impact on performance, so other options around the floating point behaviour are also worth considering.
Much further detail can be found in an Intel publication.