Search code examples
cgccoptimizationx86-64inline

run-time large performance drop from gcc 7.5.0-6ubuntu2 to gcc 8.4.0-3ubuntu2


After starting to use gcc 11 of Ubuntu 22.04 I've noticed I have ~90% degradation in my c application performance - the way I measure it.
Narrowing it I saw the degradation happens since gcc 8.4.0-3ubuntu2.
Now I'm on Ubuntu 22.04 using gcc-7 and gcc-8 (and gcc, which is gcc 11).
Compiling the exact same code with gcc-7 has good results, while compiling with gcc-8 (or gcc 11) results in slower application.

I did not find any changes that should matter in gcc 8 changes.
I don't have a simple application. If I had it means I already know the source of this issue.

Any suggestions?
Was something changed since gcc 7.5 to gcc 8.4 ?


** Edit ** - after gprof of old-fast (using gcc-7) and new-slow (using gcc-8) - I think the most valuable thing I see, is that on the new-slow version there's this entry, on the second place of Flat profile:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 39.27      9.83     9.83   173488     0.00     0.00  main_function
 22.89     15.56     5.73                             ...
 ...

Solution

  • Ok then,
    This was the case:
    For some reason gcc-7 did not care about it, but since gcc-8 it became an issue.

    As you can see, I had a big array instantiation on the stack of main_function().
    sizeof(my_big_struct) -> 100

    Pseudo-code:

    void main_function() {
      my_big_struct bigstruct_arr[20000];
      ...
    }
    
    • gcc-7 ran without any problems
    • gcc-8 (and 11) ran as well, but really slow. I'm not sure why. Too much time for allocation? Or array access?

    As you can see from perf, it says exactly that main_funcion() is the problematic one.
    It is a bit misleading because an address 0x5594faaa3090 takes all the fault.
    I did not understand what this address meant, until I did, that it's that array bigstruct_arr.

    Samples: 36K of event 'cycles', Event count (approx.): 13441606624
      Children      Self  Command          Shared Object       Symbol
    -   90.47%    88.88%  trd_1            my_process          [.] main_function
       + 88.26% 0x5594faaa3090
       + 1.60% main_function
         0.61% 0
    

    The solution was of course, defining it global or with malloc