I'm learning how to use the tool perf
to profile my c++ project. Here is my code:
#include <iostream>
#include <thread>
#include <mutex>
#include <vector>
std::mutex mtx;
long long_val = 0;
void do_something(long &val)
{
std::unique_lock<std::mutex> lck(mtx);
for(int j=0; j<1000; ++j)
val++;
}
void thread_func()
{
for(int i=0; i<1000000L; ++i)
{
do_something(long_val);
}
}
int main(int argc, char* argv[])
{
std::vector<std::unique_ptr<std::thread>> threads;
for(int i=0; i<100; ++i)
{
threads.push_back(std::move(std::unique_ptr<std::thread>(new std::thread(thread_func))));
}
for(int i=0; i<100; ++i)
{
threads[i]->join();
}
threads.clear();
std::cout << long_val << std::endl;
return 0;
}
To compile it, I run g++ -std=c++11 main.cpp -lpthread -g
and then I get the executable file named a.out
.
Then I run perf record --call-graph dwarf -- ./a.out
and wait for 10 seconds, then I press Ctrl+c
to interrupt the ./a.out
because it needs too much time to execute.
Lastly, I run perf report -g graph --no-children
and here is the output:
My goal is to find which part of the code is the heaviest. So it seems that this output could tell me do_something
is the heaviest part(46.25%). But when I enter into do_something
, I can not understand what it is: std::_Bind_simple
, std::thread::_Impl
etc.
So how to get more useful information from the output of perf report
? Or we can't get more except the fact that do_something
is the heaviest?
With the help of @Peter Cordes, I pose this answer. If you have something more useful, please feel free to pose your answers.
You forgot to enable optimization at all when you compiled, so all the little functions that should normally inline away are actually getting called. Add -O3 or at least -O2 to your g++ command line. Optionally also profile-guided optimization if you really want gcc to do a good job on hot loops.
After adding -O3
, the output of perf report
becomes:
Now we can get something useful from futex_wake
and futex_wait_setup
as we should know that mutex
in C++11 is implemented by futex
of Linux. So the result is that mutex
is the hotspot in this code.