I'm trying to implement the simple threadpool with the as low overhead as possible (preferably at C++14).
The general idea is to "pack" tasks to lambda without arguments (but with capture list), and provide a single public function to add tasks to threadpool.
Let's consider the following simple (and ugly) example, illustrating the problem (it's not the threadpool itself, but the simplified code snippet with helper class VerboseStr
).
#include <iostream>
#include <string>
#include <functional>
#include <thread>
// Small helper macros
#define T_OUT(str) std::cout << Ident() \
<< "[" << std::to_string(id_) << "] " \
<< str << std::endl; \
#define T_OUT_FROMTO(str) std::cout << Ident() \
<< "[" << std::to_string(i.id_) << "->" << std::to_string(id_) << "] " \
<< str << std::endl; \
#define T_SLEEP(ms) std::this_thread::sleep_for(std::chrono::milliseconds(ms));
// Just a verbose class, holding std::string inside
/////////////////////////////////////////////////////////
class VerboseStr
{
std::string val_;
int id_;
static int GetId() { static int id = 0; return ++id; }
// Several spaces to ident lines
std::string Ident() const { return std::string(id_, ' '); }
public:
VerboseStr() : id_(GetId())
{
T_OUT("Default constructor called");
};
~VerboseStr()
{
val_ = "~Destroyed!";
T_OUT("Destructor called");
}
VerboseStr(const std::string& i) : val_(i), id_(GetId())
{
T_OUT("Create constructor called");
};
VerboseStr(const VerboseStr& i) : val_(i.val_), id_(GetId())
{
val_ = i.val_;
T_OUT_FROMTO("Copy constructor called");
};
VerboseStr(VerboseStr&& i) noexcept : val_(std::move(i.val_)), id_(GetId())
{
T_OUT_FROMTO("Move constructor called");
};
VerboseStr& operator=(const VerboseStr& i)
{
val_ = i.val_;
T_OUT_FROMTO("Copy operator= called");
return *this;
}
VerboseStr& operator=(VerboseStr&& i) noexcept
{
val_ = std::move(i.val_);
T_OUT_FROMTO("Move operator= called");
return *this;
}
const std::string ToStr() const { return std::string("[") + std::to_string(id_) + "] " + val_; }
void SetStr(const std::string& val) { val_ = val; }
};
/////////////////////////////////////////////////////////
// Capturing args by VALUES in lambda
template<typename Fn, typename... Args>
void RunAsync_V(Fn&& func, Args&&... args)
{
auto t = std::thread([func_ = std::forward<Fn>(func), args...]()
{
T_SLEEP(1000); // "Guarantees" async execution
func_(args...);
});
t.detach();
}
void DealWithVal(VerboseStr str)
{
std::cout << "Str copy: " << str.ToStr() << std::endl;
}
void DealWithRef(VerboseStr& str)
{
std::cout << "Str before change: " << str.ToStr() << std::endl;
str.SetStr("Changed");
std::cout << "Str after change: " << str.ToStr() << std::endl;
}
// It's "OK", but leads to 2 calls of copy constructor
// Replacing 'str' with 'std::move(str)' leads to no changes
void Test1()
{
VerboseStr str("First example");
RunAsync_V(&DealWithVal, str);
}
// It's OK
void Test2()
{
VerboseStr str("Second example");
RunAsync_V(&DealWithRef, std::ref(str));
// Waiting for thread to complete...
T_SLEEP(1500);
// Output the changed value of str
std::cout << "Checking str finally: " << str.ToStr() << std::endl;
}
int main()
{
Test1();
// Test2();
T_SLEEP(3000); // Give a time to finish
}
As said in comment above, the problem is in Test1()
function.
It is obvious that in the context of Test1()
, the only possible way to asynchronously call a function DealWithVal
is to "move" str
to a lambda body.
When Test1()
called from main()
, the output is following:
[1] Create constructor called
[1->2] Copy constructor called
[2->3] Move constructor called
[2] Destructor called
[1] Destructor called
[3->4] Copy constructor called
Str copy: [4] First example
[4] Destructor called
[3] Destructor called
As we can see, there are 2 calls of copy constructor.
I cannot realize how to implement this, considering that the passing by value (without moving) and by reference (take a look at Test2()
) should be available as well.
Please help sort out the problem. Thanks in advance.
Your two copies are coming from:
auto t = std::thread([func_ = std::forward<Fn>(func), args...]()
// ^^^^^^^ here
{
T_SLEEP(1000); // "Guarantees" async execution
func_(args...);
// ^^^^^^^ and here
You copy into the lambda, which is good, and which is all you can do since you were passed an lvalue reference. Then you copy the args from the lambda into the by-value function. But by design, you own the args now and the lambda has no further purpose, so you should move them from the lambda. i.e.:
auto t = std::thread([func_ = std::forward<Fn>(func), args...]() mutable
// ^^^^^^^
{
T_SLEEP(1000); // "Guarantees" async execution
func_(std::move(args)...);
// ^^^^^^^^^^^^^^^
This reduces you to the one necessary copy.
The other answer implicitly wraps passed lvalues in reference_wrappers to get zero copies. The caller is required to maintain the value for the async lifetime, but there is no explicit record of that at the call-site. It runs counter to expectations from analogous functionality (e.g. std::thread applies a decay_copy and requires the caller to wrap in a reference wrapper if that's what they want).