Search code examples
c++c++11profilerperf

How to interpret the report of perf


I'm learning how to use the tool perf to profile my c++ project. Here is my code:

#include <iostream>
#include <thread>
#include <mutex>
#include <vector>


std::mutex mtx;
long long_val = 0;

void do_something(long &val)
{
    std::unique_lock<std::mutex> lck(mtx);
    for(int j=0; j<1000; ++j)
        val++;
}


void thread_func()
{
    for(int i=0; i<1000000L; ++i)
    {
        do_something(long_val);
    }
}


int main(int argc, char* argv[])
{
    std::vector<std::unique_ptr<std::thread>> threads;
    for(int i=0; i<100; ++i)
    {
        threads.push_back(std::move(std::unique_ptr<std::thread>(new std::thread(thread_func))));
    }
    for(int i=0; i<100; ++i)
    {
        threads[i]->join();
    }
    threads.clear();
    std::cout << long_val << std::endl;
    return 0;
}

To compile it, I run g++ -std=c++11 main.cpp -lpthread -g and then I get the executable file named a.out.

Then I run perf record --call-graph dwarf -- ./a.out and wait for 10 seconds, then I press Ctrl+c to interrupt the ./a.out because it needs too much time to execute.

Lastly, I run perf report -g graph --no-children and here is the output:

enter image description here

My goal is to find which part of the code is the heaviest. So it seems that this output could tell me do_something is the heaviest part(46.25%). But when I enter into do_something, I can not understand what it is: std::_Bind_simple, std::thread::_Impl etc.

So how to get more useful information from the output of perf report? Or we can't get more except the fact that do_something is the heaviest?


Solution

  • With the help of @Peter Cordes, I pose this answer. If you have something more useful, please feel free to pose your answers.

    You forgot to enable optimization at all when you compiled, so all the little functions that should normally inline away are actually getting called. Add -O3 or at least -O2 to your g++ command line. Optionally also profile-guided optimization if you really want gcc to do a good job on hot loops.

    After adding -O3, the output of perf report becomes:

    enter image description here

    Now we can get something useful from futex_wake and futex_wait_setup as we should know that mutex in C++11 is implemented by futex of Linux. So the result is that mutex is the hotspot in this code.