I have a thread pool that I use to execute many tiny jobs (millions of jobs, dozens/hundreds of milliseconds each). The jobs are passed in the form of either:
std::bind(&fn, arg1, arg2, arg3...)
or
[&](){fn(arg1, arg2, arg3...);}
with the thread pool taking them like this:
std::queue<std::function<void(void)>> queue;
void addJob(std::function<void(void)> fn)
{
queue.emplace_back(std::move(fn));
}
Pretty standard stuff....except that I've noticed a bottleneck where if jobs execute in a fast enough time (less than a millisecond), the conversion from lambda/binder to std::function in the addJob function actually takes longer than execution of the jobs themselves. After doing some reading, std::function is notoriously slow and so my bottleneck isn't necessarily unexpected.
Is there a faster way of doing this type of thing? I've looked into drop-in std::function replacements but they either weren't compatible with my compiler or weren't faster. I've also looked into "fast delegates" by Don Clugston but they don't seem to allow the passing of arguments along with functions (maybe I don't understand them correctly?).
I'm compiling with VS2015u3, and the functions passed to the jobs are all static, with their arguments being either ints/floats or pointers to other objects.
Have a separate queue for each of the task types - you probably don't have tens of thousands of task types. Each of these can be e.g. a static member of your tasks. Then addJob()
is actually the ctor of Task and it's perfectly type-safe.
Then define a compile-time list of your task types and visit it via template metaprogramming (for_each). It'll be way faster as you don't need any virtual call fnptr / std::function<>
to achieve this.
This will only work if your tuple code sees all the Task classes (so you can't e.g. add a new descendant of Task to an already running executable by loading the image from disc - hope that's a non-issue).
template<typename D> // CRTP on D
class Task {
public:
// you might want to static_assert at some point that D is in TaskTypeList
Task() : it_(tasks_.end()) {} // call enqueue() in descendant
~Task() {
// add your favorite lock here
if (queued()) {
tasks_.erase(it_);
}
}
bool queued() const { return it_ != tasks_.end(); }
static size_t ExecNext() {
if (!tasks_.empty()) {
// add your favorite lock here
auto&& itTask = tasks_.begin();
tasks_.pop_front();
// release lock
(*itTask)();
itTask->it_ = tasks_.end();
}
return tasks_.size();
}
protected:
void enqueue() const
{
// add your favorite lock here
tasks_.push_back(static_cast<D*>(this));
it_ = tasks_.rbegin();
}
private:
std::list<D*>::iterator it_;
static std::list<D*> tasks_; // you can have one per thread, too - then you don't need locking, but tasks are assigned to threads statically
};
struct MyTask : Task<MyTask> {
MyTask() { enqueue(); } // call enqueue only when the class is ready
void operator()() { /* add task here */ }
// ...
};
struct MyTask2; // etc.
template<typename...>
struct list_ {};
using TaskTypeList = list_<MyTask, MyTask2>;
void thread_pocess(list_<>) {}
template<typename TaskType, typename... TaskTypes>
void thread_pocess(list_<TaskType, TaskTypes...>)
{
TaskType::ExecNext();
thread_process(list_<TaskTypes...>());
}
void thread_process(void*)
{
for (;;) {
thread_process(TaskTypeList());
}
}
There's a lot to tune on this code: different threads should start from different parts of the queue (or one would use a ring, or several queues and either static/dynamic assignment to threads), you'd send it to sleep when there are absolutely no tasks, one could have an enum for the tasks, etc.
Note that this can't be used with arbitrary lambdas: you need to list task types. You need to 'communicate' the lambda type out of the function where you declare it (e.g. by returning `std::make_pair(retval, list_) and sometimes it's not easy. However, you can always convert a lambda to a functor, which is straightforward - just ugly.