is std::packaged_task really expensive?

I am surprised at the results of the following code using gcc 4.7.2 on Opensuse Linux:

#include <cmath>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <chrono>
#include <iostream>
#include <future>

int main(void)
{
  const long N = 10*1000*1000;
  std::vector<double> array(N);
  for (auto& i : array)
    i = rand()/333.;

  std::chrono::time_point<std::chrono::system_clock> start, end;
  start = std::chrono::system_clock::now();
  for (auto& i : array)
    pow(i,i);
  end = std::chrono::system_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

  start = std::chrono::system_clock::now();
  for (auto& i : array)
    std::packaged_task<double(double,double)> myTask(pow);
  elapsed_seconds = std::chrono::system_clock::now()-start;
  std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

  start = std::chrono::system_clock::now();
  for (auto& i : array)
    std::packaged_task<double()> myTask(std::bind(pow,i,i));
  elapsed_seconds = std::chrono::system_clock::now()-start;
  std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

  return 0;
}

The results look like this (and are fairly consistent amongst runs):

elapsed time: 0.694315s
elapsed time: 6.49907s
elapsed time: 8.42619s

If I interpret the results correctly, just creating a std::packaged_task (not even executing it or storing its arguments yet) is already ten times more expensive than executing pow. Is that a valid conclusion?

Why is this so?

Is this by accident gcc specific?

Solution

You are not timing the execution of a packaged_task, only its creation.

std::packaged_task<double(double,double)> myTask(pow);

This does not execute myTask, only creates it. Ideally you shouldn't be measuring this, you should be measuring myTask(i, i), which I did by changing your program to the following (I removed the measuring with std::bind).

Results are worse than what you measured:

timing raw
elapsed time: 0.578244s

timing ptask
elapsed time: 20.7379s

I guess packaged_tasks are not suitable for repeatable small tasks, the overhead is certainly greater than the task itself. My reading on this is that you should use them for multitasking code, on a task that would take longer than the overhead associated with calling and synchronizing a packaged_task.

If you're not multitasking, I think there's no point in wrapping a function call in classes ready for multithreading with synchronization primitives, they're not free, sadly.

For the record, here's what I used:

#include <cmath>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <chrono>
#include <iostream>
#include <future>
#include <thread>

int main(void)
{
  const long N = 10*1000*1000;
  std::vector<double> array(N);
  for (auto& i : array)
    i = rand()/333.;

  std::cout << "timing raw" << std::endl;
  std::chrono::time_point<std::chrono::system_clock> start, end;
  start = std::chrono::system_clock::now();
  for (auto& i : array)
    pow(i,i);
  end = std::chrono::system_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n\n";

  std::cout << "timing ptask" << std::endl;
  start = std::chrono::system_clock::now();
  std::packaged_task<double(double,double)> myTask(pow);
  for (auto& i : array)
  {
      myTask(i, i);
      myTask.get_future().wait();
      myTask.reset();
  }
  elapsed_seconds = std::chrono::system_clock::now()-start;
  std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n\n";
  return 0;
}