Search code examples
openmplibgomp

Cannot understand how libgomp implements the FOR construct


According to the libgomp manual, a code in the form:

#pragma omp parallel for
for (i = lb; i <= ub; i++)
  body;

becomes

void subfunction (void *data)
{
  long _s0, _e0;
  while (GOMP_loop_static_next (&_s0, &_e0))
  {
    long _e1 = _e0, i;
    for (i = _s0; i < _e1; i++)
      body;
  }
  GOMP_loop_end_nowait ();
}

GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
subfunction (NULL);
GOMP_parallel_end ();

I did a very tiny program to debug just to see how this implementation works:

int main(int argc, char** argv)
{
  int res, i;
  # pragma omp parallel for num_threads(4)
  for(i = 0; i < 400000; i++) 
      res = res*argc;

  return 0;
} 

Next, I ran gdb and set breakpoints to "GOMP_parallel_loop_static" and "GOMP_parallel_end". At the beginning, the library was not loaded, so they were pending. By the time a ran the test program inside gdb, I got the result below:

(gdb) run 2 1 6 5 4 3 8 7
Starting program: ./test 2 1 6 5 4 3 8 7
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
[New Thread 0x7ffff73c9700 (LWP 5381)]
[New Thread 0x7ffff6bc8700 (LWP 5382)]
[New Thread 0x7ffff63c7700 (LWP 5383)]

 Thread 1 "test" hit Breakpoint 2, 0x00007ffff7bc0c00 in GOMP_parallel_end () from /usr/lib/x86_64-linux-gnu/libgomp.so.1

As you can see, It reached the second breakpoint, in "GOMP_parallel_end" but not the first. I would like to know how could this be possible if the libgomp manual shows clearly that "GOMP_parallel_loop_static" comes first.

Thank you.


Solution

  • That part of GCC's documentation has not really been updated regularly, so it's probably a good idea to only read it as an approximation of what is actually happening. If you're interested in that level of detail, I suggest you look at the debug files generated by -fdump-tree-all and similar options.

    With a recent version of GCC, your example generates a call to __builtin_GOMP_parallel, which maps to GOMP_parallel. That one internally calls GOMP_parallel_end at the end, so that's what you're seeing, I suppose.

    void
    GOMP_parallel (void (*fn) (void *), void *data, unsigned num_threads, unsigned int flags)
    { 
      num_threads = gomp_resolve_num_threads (num_threads, 0);
      gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
      fn (data);
      ialias_call (GOMP_parallel_end) ();
    }
    

    Of course, patches to update the documentation will be gladly accepted. :-)