Search code examples
fortranintel-fortran

Difference between floating points addition runtime


I was writing a code when I realized one line is taking a huge time. Here's a simplified version (the line is indicated by !*)

program main

implicit none

real*8, allocatable :: x(:), y(:), f(:)
real*8 :: one, two, six, alpha, sigma, eps, m, n, r2, r, ff, start, finish, rr
integer*8 :: q, i, j

q = 10000
one = 1.
two = 2.
six = 6.
alpha = 4.
n = 12.
m = 6.
eps = 5.
sigma = 1.
rr = 2.1234567654324556

allocate(x(q), y(q), f(q))
call RANDOM_NUMBER(x)
call RANDOM_NUMBER(y)
f(:) = 0.
call CPU_TIME(start)

do i=1,q
    do j=i+1,q
        r2 = (x(i)-x(j))**two+(y(i)-y(j))**two
        ff = six*alpha*eps*(one/r2*(sigma**m/(r2**(m/two))-two*sigma**n/(r2**(n/two))))
        r = -(x(i)-x(j))*ff
        f(i) = f(i) + r     !*
    end do
end do

call CPU_TIME(finish)
print*, finish-start


end program main

the time needed to run this code is approximately 10 seconds, but if you change r with rr in the line which is indicated by !*, the time will be 0.01. Can anyone explain this? What is the difference between r and rr while they are both real*8?

I am using Windows 8.1, Visual Studio 12 Ultimate, Intel Composer XE 2013 and the -O2 flag.


Solution

  • Converting the comments into an answer...

    If you you rr instead of r in the marked line, all the computation of that loop are irrelevant and the compiler can optimize them away. My guess is that this results in the "performance increase" you see.

    Also, most of the calculations you perform in the loop do not depend on x and y. You can easily pre-compute them. Also, please note that (depending on the intelligence of your compiler), x**2 is faster than x**2.0.