Search code examples
c++profilinggprof

_fu2___ZSt4cout is taking 21.49% of the running time in my C++ code


I am using gprof to optimize my C++ code, and I am obtaining the following results:

Flat profile:

Each sample counts as 0.01 seconds.
 %    cumulative  self               self     total
time   seconds   seconds    calls    s/call   s/call  name    
21.49      2.31     2.31                              _fu2___ZSt4cout
12.93      3.70     1.39   1560037     0.00     0.00  __gnu_cxx::new_allocator<DataINSPVAS>::construct(DataINSPVAS*, DataINSPVAS const&)
 8.56      4.62     0.92  30267700     0.00     0.00  __gnu_cxx::new_allocator<AntennaData>::construct(AntennaData*, AntennaData const&)
 6.14      5.28     0.66 261159927     0.00     0.00  __gnu_cxx::__normal_iterator<char*, std::string>::__normal_iterator(char* const&)
 5.40      5.86     0.58 149234244     0.00     0.00  bool __gnu_cxx::operator!=<char*, std::string>(__gnu_cxx::__normal_iterator<char*, std::string> const&, __gnu_cxx::__normal_iterator<char*, std::string> const&) ...

According to this flat profile, the function fu2__ZSt4cout is using 21.49% of the running time. Does anybody knows what fu2__ZSt4cout stands for?


Solution

  • (Quick point: There as so many questions on SO like this.)

    First, gprof is a "CPU profiler". That means during IO or any other blocking syscall it is Shut Off. Your program could run for 100 seconds, spending 99 seconds on IO, and gprof will act as though it only spent 1 second.

    Second, you're looking at self time. Self time is useless, except for functions that crunch a lot without calling subfunctions. So if you've got a bubble sort of an integer array, and you're spending a large fraction of overall time in it, gprof will show it as a bottleneck. Change it to a sort on strings, where comparison requires calling a function, and gprof will show a big % in strcmp, which is not where the problem is at all.

    From looking at your output, my guess is that your program mostly does IO, so it doesn't surprise me if, of the small amount of time it spends actually running, a large part of that is going into and coming out of library IO routines. You're also showing a lot of self time in new and an iterator. Not surprising at all.

    If you're looking for a profiler, you want one that samples the entire call stack, on wall-clock time (not CPU time), and reports the percent of time each line of code appears on those stacks. One such profiler is Zoom. (BTW, don't fall for the line that you need high-frequency samples, to get "accuracy". If you get 1000 samples, that's way more than enough, to see what's taking time.)

    When I'm tuning performance, I use this method.