In my program, I use the Boost-Spirit-Qi to parse large data sets. Input data are sequential records . I am trying to use the TBB to increase the efficiency of parsing. The procedure for parallel processing is as follows:
typedef map<string, data_struct_t> mdata_t;
vector<string> text;
mdata_t data;
parallel_for(blocked_range<size_t>(0, input.size(), gs),
[&] (blocked_range<size_t>& r) {
data_struct_t cs;
mdata_t cr;
string s;
for(size_t i=r.begin(); i<r.end(); i++) {
s = text[i];
Parser::task1(s, cs);
Parser::task2(s, cs);
Parser::task3(s, cs);
....
Parser::task8(s, cs);
cr.insert(std::make_pair(cs.title, cs));
}
data.insert(cr.begin(), cr.end());
}, ap);
My program uses only 10% of the CPU (2 CPU, 16 cores) and works on 8 cores. I do not understand why the remaining 8 cores are not used (single processor). I would be grateful for pointing me to the correct algorithm parallelization this task.
Thanks for the advice.
Stan
Your input.size()
might be small or gs
is too big to prevent creation of enough amount of parallelism. Otherwise, if the number of threads is of the concern, check process (affinity) mask of your program when you start it and how TBB is initialized (e.g. if tbb::task_scheduler_init
is created with small number of threads).
As for CPU utilization, it is expected when your work is IO-bound, i.e. reading a file. It is also possible that the time necessary to complete one parallel iteration differs a lot from another iteration. In this case, small iterations might be completed even before all the threads are created. (You should manually wait when all the threads are operational if you want to measure speedup accurately)
Advices:
You have a bug with data.insert
since std::map
is not safe for concurrent modification. Use tbb::concurrent_unordered_map
or just tbb::parallel_reduce
in order to merge partial results collected in cr
from different threads.
The pattern Parser::task1(s, cs); ... Parser::task8(s, cs);
can also be parallelized if the tasks do not share a global state. See tbb::parallel_pipeline
which will enable pipeline-type of parallelism for the chain of these independent tasks.