I have a performance critical piece of C++ code running in Visual Studio 2017 that I've been profiling to look for potential bottlenecks. The profiler at a high level shows about 80% CPU usage across my eight cores executing this code. Having loaded in all the kernel symbols, the profiler shows that the busiest function is NTYieldExecution at 52% usage.
My guess is that this 52% is not correct, possibly 52% of one thread, but even then I'd be keen to know what's going on under the hood. I also have my own thread pool code which lead to 100% CPU usage on other code, so I'm wondering whether to move this code to an alternative multi-threading model. OpenMP is very convenient, but is it inefficient in Visual Studio 2017? More importantly, is it possible to isolate and remove any such inefficiencies?
The problem as it turned out was that part of the multi-threaded code in this case was inadvertently writing to a variable outside the scope of the OpenMP section which was in turn leading to the automatic insertion of a lock, as seen in the PartialBarrierN::Block. I resolved this by changing this to a more local variable which resulted in a significant speed up and 100% CPU usage.