Our parallel Fortran program is running more than twice slower after updating the OS to Ubuntu 14.04 and rebuilding with Gfortran 4.8.2. To measure which parts of the code were slowed down is unfortunately not possible any more (not without downgrading the OS) since I did not save any profiling information for gprof when compiling under the old OS.
Because the program does lots of matrix inversions, my guess is that one library (LAPACK?) or programming interface (OpenMP?) has been updated between Ubuntu 12 and 14 in a way that slows down everything. I believe this is a general problem which could be already known to somebody here. Which is the solution to get back to a fast Fortran code, besides downgrading back to Ubuntu 12 or 13?
All libraries were installed from the repositories with apg-get and thus,they should have be also upgraded when I upgraded the system with apt-get dist-upgrade
, I could, however, check if they are indeed the latest versions and/or build them from scratch.
I followed Steabert's advice and profiled the present code: I recompiled with gfortran -pg
and checked the performance with gprof
. The program was suspiciously slow when calling some old F77 subroutines, which I translated to f90 without performance improvement. I played with the suggested flags and compared the time of one program iteration: Flags -fno-aggressive-loop-optimizations
, -llapack
and -lblas
did not yield any significant performance improvement. Flags -latlas
, -llapack_latlas
and -lf77blas
did not compile (/usr/bin/ld: cannot find -lf77blas
, etc.), even though the libraries exist and are in the right path. Both the compiler flags playing and performance analysis suggest that my first guess (the slowing down being related to matrix inversions, LAPACK, etc.) was wrong. It rather seems that the slowing down is in a part of the code where no heavy linear algebra is performed. Using objdump my_exec -s
I have found out that my program was originally compiled with gfortran 4.6.3 before the OS upgrade. Instead of using the present gfortran (4.8.2). I could try now to compile the code with the old compiler.
Probably this is not 100% a satisfactory answer, but it solved my performance problem. So here it is:
I decided to use GDB (Valgrid did not work for me): I compiled with the flag -g, executed with “gdb myprogramname”, typed “run” in the GDB prompt to execute the program, paused with ctr+C, checked what the threads were doing with “info threads” and continued with “continue”. I did this several times, randomly, to make some kind of statistic where the program was spending most of the time. This soon confirmed what gprof found before, i.e., that my program was investing lots of time in the function I translated to f90. However, now I also found out that the mathematical operation which was taking particularly long time inside this function was exponentiation, as suggested by the call to the C function e_powf.c. My function (the equation of state of sea water) has lots of high order polynomials with terms like T**3
, T**4
. To avoid calling e_powf.c and see if this improved the performance of the code I changed all the terms of the type T**2
to T*T
; T**3
to T*T*T
, etc. Here is a function's extract how it was before:
! RW = 999.842594 + 6.793952E-2*T - 9.095290E-3*T**2+ 1.001685E-4*T**3 - 1.120083E-6*T**4 + 6.536332E-9*T**5
and how it is now:
RW = 999.842594 + 6.793952E-2*T - 9.095290E-3*T*T+ 1.001685E-4*T*T*T - 1.120083E-6*T*T*T*T + 6.536332E-9*T*T*T*T*T
As a result, my program is running again twice as fast (i.e., like before I upgraded the OS). While this solves my performance problem, I cannot be 100% sure if it is really related to the OS upgrade or compiler change from 4.6.3 to 4.8.2. Though the present performance being similar to the pre-OS-upgrade really suggests it should be.
Unfortunately, “locate e_powf” does not yield any results in my system, it seems as if the function is a binary part of the gfortran compiler, but the source code is not given along. By googling, it seems that e_powf.c itself does not seem to have been updated lately (I guess, by such occurrences in Internet like http://koala.cs.pub.ro/lxr/#glibc/sysdeps/ieee754/flt-32/e_powf.c), so if something changed from Ubuntu 12 to 14 or from gfortran 4.6.3 to 4.8.2 it seems rather something subtle in the way this function is used.
Because I found in internet some discussions about if using T*T
instead of T**2
, etc should bring some performance improvement and most of the people seem skeptic about it (for instance: http://computer-programming-forum.com/49-fortran/6b042075d8c77a4b.htm ; or a closed question in stackoverflow: Tips and tricks on improving Fortran code performance) I double checked my finding, so I can say I'm pretty much sure that using products of a variable (and avoiding like this calling e_powf.c) is faster than exponentiation, at least with gfortran 4.8.2 (which ever the reason).
Thanks a lot for all of you who commented, surely it helped me a lot and I learned plenty!