Search code examples
c++parallel-processingtbbboost-spirit-qi

TBB parallelization of parsing with boots::spirit::qi


In my program, I use the Boost-Spirit-Qi to parse large data sets. Input data are sequential records . I am trying to use the TBB to increase the efficiency of parsing. The procedure for parallel processing is as follows:

typedef map<string, data_struct_t> mdata_t;
vector<string> text; 
mdata_t  data;

parallel_for(blocked_range<size_t>(0, input.size(), gs),
                     [&]  (blocked_range<size_t>& r) {
        data_struct_t cs;
        mdata_t cr;
        string s;
        for(size_t i=r.begin(); i<r.end(); i++) {
           s = text[i];         
           Parser::task1(s, cs); 
           Parser::task2(s, cs); 
           Parser::task3(s, cs);
        ....
           Parser::task8(s, cs);   
           cr.insert(std::make_pair(cs.title, cs));
        }
        data.insert(cr.begin(), cr.end());  

 }, ap);

My program uses only 10% of the CPU (2 CPU, 16 cores) and works on 8 cores. I do not understand why the remaining 8 cores are not used (single processor). I would be grateful for pointing me to the correct algorithm parallelization this task.

Thanks for the advice.

Stan


Solution

  • Your input.size() might be small or gs is too big to prevent creation of enough amount of parallelism. Otherwise, if the number of threads is of the concern, check process (affinity) mask of your program when you start it and how TBB is initialized (e.g. if tbb::task_scheduler_init is created with small number of threads).

    As for CPU utilization, it is expected when your work is IO-bound, i.e. reading a file. It is also possible that the time necessary to complete one parallel iteration differs a lot from another iteration. In this case, small iterations might be completed even before all the threads are created. (You should manually wait when all the threads are operational if you want to measure speedup accurately)

    Advices:

    You have a bug with data.insert since std::map is not safe for concurrent modification. Use tbb::concurrent_unordered_map or just tbb::parallel_reduce in order to merge partial results collected in cr from different threads.

    The pattern Parser::task1(s, cs); ... Parser::task8(s, cs); can also be parallelized if the tasks do not share a global state. See tbb::parallel_pipeline which will enable pipeline-type of parallelism for the chain of these independent tasks.