Search code examples
c++openmpcpu-usage

OpenMP has low CPU usage


My OpenMP Implementation shows a really bad performance. When I profile it with vtune, I have a super low CPU usage and I don't know why. Does anyone have an idea?

Hardware:

  • NUMA architecture with 28 cores (56 Threads)

Implementation:

struct Lineitem {
    int64_t l_quantity;
    int64_t l_extendedprice;
    float l_discount;
    unsigned int l_shipdate;
};

Lineitem* array = (Lineitem*)malloc(sizeof(Lineitem) * array_length);

// array will be filled

#pragma omp parallel for num_threads(48) shared(array, array_length, date1, date2) reduction(+: sum)
for (unsigned long i = 0; i < array_length; i++)
{
     if (array[i].l_shipdate >= date1 && array[i].l_shipdate < date2 &&
         array[i].l_discount >= 0.08f && array[i].l_discount <= 0.1f &&
         array[i].l_quantity < 24)
         {
              sum += (array[i].l_extendedprice * array[i].l_discount);
         }
}

Additionally as information, I am using cmake and clang.


Solution

  • I was able to find the cause of my poor OpenMP performance. I am running my OpenMP code inside a thread pinned to a core. If I don't pin the thread to a core, then the OpenMP code is fast.

    Probably the threads created by OpenMP in the pinned thread are also executed on the core where the pinned thread is pinned. Consequently, the whole OpenMP code runs on only one core with many threads.