I am surprised at the results of the following code using gcc 4.7.2 on Opensuse Linux:
#include <cmath>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <chrono>
#include <iostream>
#include <future>
int main(void)
{
const long N = 10*1000*1000;
std::vector<double> array(N);
for (auto& i : array)
i = rand()/333.;
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for (auto& i : array)
pow(i,i);
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
start = std::chrono::system_clock::now();
for (auto& i : array)
std::packaged_task<double(double,double)> myTask(pow);
elapsed_seconds = std::chrono::system_clock::now()-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
start = std::chrono::system_clock::now();
for (auto& i : array)
std::packaged_task<double()> myTask(std::bind(pow,i,i));
elapsed_seconds = std::chrono::system_clock::now()-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
return 0;
}
The results look like this (and are fairly consistent amongst runs):
elapsed time: 0.694315s
elapsed time: 6.49907s
elapsed time: 8.42619s
If I interpret the results correctly, just creating a std::packaged_task
(not even executing it or storing its arguments yet) is already ten times more expensive than executing
pow
. Is that a valid conclusion?
Why is this so?
Is this by accident gcc specific?
You are not timing the execution of a packaged_task
, only its creation.
std::packaged_task<double(double,double)> myTask(pow);
This does not execute myTask
, only creates it. Ideally you shouldn't be measuring this, you should be measuring myTask(i, i)
, which I did by changing your program to the following (I removed the measuring with std::bind
).
Results are worse than what you measured:
timing raw
elapsed time: 0.578244s
timing ptask
elapsed time: 20.7379s
I guess packaged_task
s are not suitable for repeatable small tasks, the overhead is certainly greater than the task itself. My reading on this is that you should use them for multitasking code, on a task that would take longer than the overhead associated with calling and synchronizing a packaged_task
.
If you're not multitasking, I think there's no point in wrapping a function call in classes ready for multithreading with synchronization primitives, they're not free, sadly.
For the record, here's what I used:
#include <cmath>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <chrono>
#include <iostream>
#include <future>
#include <thread>
int main(void)
{
const long N = 10*1000*1000;
std::vector<double> array(N);
for (auto& i : array)
i = rand()/333.;
std::cout << "timing raw" << std::endl;
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
for (auto& i : array)
pow(i,i);
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n\n";
std::cout << "timing ptask" << std::endl;
start = std::chrono::system_clock::now();
std::packaged_task<double(double,double)> myTask(pow);
for (auto& i : array)
{
myTask(i, i);
myTask.get_future().wait();
myTask.reset();
}
elapsed_seconds = std::chrono::system_clock::now()-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n\n";
return 0;
}