Labeling data for Bag Of Words

I've been looking at this tutorial and the labeling part confuses me. Not the act of labeling itself, but the way the process is shown in the tutorial.

More specifically the #pragma omp sections:

#pragma omp parallel for schedule(dynamic,3)
for(..loop a directory?..) {

   ...

   #pragma omp critical
   {
      if(classes_training_data.count(class_) == 0) { //not yet created...
         classes_training_data[class_].create(0,response_hist.cols,response_hist.type());
         classes_names.push_back(class_);
      }
      classes_training_data[class_].push_back(response_hist);
   }
   total_samples++;
}

As well as the following code below it.

Could anyone explain what is going on here?

Solution

The pragmas are from OpenMP, a specification for a set of compiler directives, library routines, and environment variables that can be used to specify high-level parallelism in Fortran and C/C++ programs.

The #pragma omp parallel for schedule(dynamic,3) is a shorthand that combines several other pragmas. Let's see them:

#pragma omp parallel starts a parellel block with a set of threads that will execute the next stament in parallel.

You can also specify "parallel loops", like a for loop: #pragma omp parallel for. This pragma will split the for-loop between all the threads inside the parallel block and each thread will execute its portion of the loop.

For example:

 #pragma omp parallel
 {
  #pragma omp for
  for(int n(0); n < 5; ++n) {
     std::cout << "Hello\n";
 }

This will create a parallel block that will execute a for-loop. The threads will print to the standard output Hello five times, in no specified order (I mean, thread #3 can print its "Hello" before thread #1 and so.).

Now, you can also schedule which chunk of work will each thread receive. There are several policies: static (the default) and dynamic. Check this awesome answer in regards to scheduling policies.

Now, all of this pragmas can be shortened to one:

#pragma omp parallel for schedule(dynamic,3)

which will create a parallel block that will execute a for-loop, with dynamic scheduling and each thread in the block will execute 3 iterations of the loop before asking the scheduler for more chunks.

The critical pragma will restrict the execution of the next block to a single thread at time. In your example, only one thread at a time will execute this:

   {
      if(classes_training_data.count(class_) == 0) { //not yet created...
         classes_training_data[class_].create(0,response_hist.cols,response_hist.type());
         classes_names.push_back(class_);
      }
      classes_training_data[class_].push_back(response_hist);
   }

Here you have an introduction to OpenMP 3.0.

Finally, the variables you mention are specified in the tutorial, just look before your posted code:

vector<KeyPoint> keypoints;
Mat response_hist;
Mat img;
string filepath;
map<string,Mat> classes_training_data;

Ptr<FeatureDetector > detector(new SurfFeatureDetector());
Ptr<DescriptorMatcher > matcher(new BruteForceMatcher<L2<float> >());
Ptr<DescriptorExtractor > extractor(new OpponentColorDescriptorExtractor(Ptr<DescriptorExtractor>(new SurfDescriptorExtractor())));
Ptr<BOWImgDescriptorExtractor> bowide(new BOWImgDescriptorExtractor(extractor,matcher));
bowide->setVocabulary(vocabulary);