Search code examples
c++multithreadingoptimizationintel-vtune

sched_yield flagged as top hotspot


In analyzing a program with Vtune, I see that the libc function sched_yield is flagged as a significant hotspot.

Now I see that this function is roughly responsible for context switching; I say roughly because it's the first time I encounter this function, so my understanding is that it runs inside the OS scheduler to provide support for changing the active thread(s).

What does having sched_yield as a major hotspot, mean for my program? Does it mean that I create more threads than I should and the OS is trying to juggle continuous context switching?

What would be a remedy for this situation? Maybe resort to more centralized thread pools to avoid over-spawning threads?

What should I analyze next? Are there "typical" next steps in this situation? Vtune already suggests running an analysis for "threading".


Solution

  • What does having sched_yield as a major hotspot, mean for my program?

    On Linux, sched_yield does not necessarily switch to another thread to execute. The kernel does not deschedule the calling thread if there aren't threads that are ready to run on the same CPU. The last part is important, since the kernel will not reschedule a ready to run thread between CPUs upon this call. This is a design tradeoff, as sched_yield is supposed to be a low cost hint to the kernel.

    Since sched_yield may just return immediately without doing anything, your code may act as having a busy loop around this call, which will look like a hot spot in your profile. Your code just loops around sched_yield a lot, without doing much else. Such spinning burns a lot of CPU cycles which could be spent for other threads and applications running on the system.

    What would be a remedy for this situation?

    This depends a lot of your use case. Using sched_yield may be acceptable when you are willing to waste some CPU cycles in exchange for better latency. You have to be conscious about this decision, and even then I would recommend benchmarking a different solution, with proper thread blocking. Linux thread scheduler is quite efficient, so blocking and waking threads is not as expensive as on some other systems.

    Often sched_yield is used in custom spin lock algorithms. I would recommend replacing these with pthread components, in particular pthread_cond_t, which allow to properly block and wake up threads. If you're using C++, there are equivalents in the standard library (e.g. std::condition_variable). In other cases it may be worth exploring other blocking APIs, such as select and epoll. The exact solution depends on your use case.