Search code examples
performanceparallel-processingcpuintelintel-vtune

Vtune Amplifier XE for Multicores?


I'm using Intel Vtune Amplifier XE 2013 to profile a parallel program running on a multicore CPU, in particular it is written in OpenCL and executed in Xeon Phi. I wonder how should be the exact interpretation of the results brought by Vtune, i.e.,

  1. Is it the value of the performance counter collected by a single thread or the whole core? (Assuming there are many cores in a CPU and many threads can be executed concurrently on a core, as in case of Xeon Phi).
  2. How did Vtune sample on a multicore CPU? Did it sample on a single core and report it, or sample on many cores and take the average?

Solution

  • VTune samples all cores by default on Xeon Phi, the results can be viewed in either way: aggregated or per core. Use Grouping drop down box in the BottomUp tab in GUI to regulate the way of data aggregation, use "change Viewpoint" in order to switch between hotspots, event counts and other available views.

    For more information on OpenCl analysis by VTune on Xeon Phi please refer to below articles:

    http://software.intel.com/en-us/articles/performance-tuning-of-opencl-applications-on-intel-xeon-phi-coprocessor-using-intel-vtune-amplifier-xe-2013

    http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding