Search code examples
openmp

Why does my OpenMP app sometimes use only 1 thread, sometimes 3, sometimes all cores?


I noticed that my OpenMP-enabled app sometimes only uses 1 thread.

  • If I wait a couple minutes, it uses more threads (e.g. 3).
  • If I wait 15 minutes, it uses all threads.

Why?


Solution

  • Quick answer

    You may be using OpenCV which turns on OMP_DYNAMIC, and the GCC libomp implementation of that depends on the 15-minute load average (a bad idea in my opinion).

    Full answer

    This the default implementation of libgomp OMP_DYNAMIC / omp_set_dynamic() is e.g. on Linux libgomp/config/linux/proc.c

    /* When OMP_DYNAMIC is set, at thread launch determine the number of
       threads we should spawn for this team.  */
    /* ??? I have no idea what best practice for this is.  Surely some
       function of the number of processors that are *still* online and
       the load average.  Here I use the number of processors online
       minus the 15 minute load average.  */
    
    unsigned
    gomp_dynamic_max_threads (void) {
    // ...
        return n_onln - loadavg;
    

    OMP_DYNAMIC is really bad

    • Because of this logic, your app will use only 1 thread, even though the system is completely idle now, just because it was busy 10 minutes ago.
    • The dynamic limit is determined at process start (loading time), and fixed forever.
      • So started programs stay slow forever if they were started at a time 5 minutes after the system was busy.
    • It means a server can never achieve full utilisation when working down a queue of jobs.
      • Say you have 8 cores, and a queue of N jobs to process, each of which takes 15 minutes full-CPU.
      • The first jobs starts at 0 15-min-utilisation, thus using all cores.
      • The next job starts, using only 1 core.
      • The next job starts, using only 7 cores.
      • The next job starts, using only 1 cores.
      • The next job starts, using only 7 cores.
      • ...
      • In the long run, the server uses only half of its cores on average.
    • It makes performance behaviour completely irreproducible.

    None of this behaviour is documented in libgomp or the OpenMP spec for OMP_DYNAMIC.

    Those docs sound like the behaviour is nice "runtime-dynamic" when in fact it is fixed across the process's liftime, and based on ultra-slow rolling averages.

    In my opinion, somebody should file a GCC bug for this.

    OpenCV

    Unfortunately the popular OpenCV library turned this on by default in 2018.

    This is called at program loading time, as revealed by gdb:

    (gdb) break omp_set_dynamic
    
    Breakpoint 1, 0x00007ffff706c330 in omp_set_dynamic () from /nix/store/7c0yrczwxn58f9gk9ipawdh607vh067k-gcc-12.2.0-lib/lib/libgomp.so.1
    (gdb) bt
    #0  0x00007ffff706c330 in omp_set_dynamic () from /nix/store/7c0yrczwxn58f9gk9ipawdh607vh067k-gcc-12.2.0-lib/lib/libgomp.so.1
    #1  0x00007ffff6c7ed21 in _GLOBAL__sub_I_parallel.cpp () from /nix/store/z2sc2pxysa8shfs9laj0xsmj87qhaq5h-opencv-4.7.0/lib/libopencv_core.so.407
    #2  0x00007ffff7fcefae in call_init () from /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2
    #3  0x00007ffff7fcf09c in _dl_init () from /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2
    #4  0x00007ffff7fe4a80 in _dl_start_user () from /nix/store/46m4xx889wlhsdj72j38fnlyyvvvvbyb-glibc-2.37-8/lib/ld-linux-x86-64.so.2
    

    So if your app dynamically links against OpenCV, this happens.

    I filed a bug to disable it: https://github.com/opencv/opencv/issues/25717

    Workaround for OpenCV

    Set the environment variable

    OPENCV_FOR_OPENMP_DYNAMIC_DISABLE=1  yourprogram...