Search code examples
assemblyx8664-bitprofilingintel-vtune

How to restrict Vtune Analysis to a specific function


I have a program whose basic structure is as below :

<c language headers>
main() {
    some malloc() allocations and file reads into these buffers
    call to an assembly language routine that needs to be optimized to the maximum
    write back the output of to files and do free()
exit()
}

The assembly language program essentially computes the checksum of the data in the buffer, and my intention is to optimize it to the absolute maximum. It does not make any system calls, or any library function calls.

I have just installed Intel vTune Amplifier XE suite into VS 2015.

How do I specify to the vtune to focus strictly on the assembly language routine part, and skip all the analysis on the "C" language preparatory parts. I seem to be getting all the data cumulated, like INSTRUCTION COUNT, or CPI, etc. Is it possible to get the data only for the loops and the branches within the assembly language subroutine. If so, please advise how I could do that.

Thanks


Solution

  • You can instrument your code via VTune provided API to analyze specific regions in your workload. Use Task API for tracking thread-specific activities or Frame API for analyzing global stages in workload.

    Configuring analysis type, select option "Analyze user tasks" to handle instrumented tasks. When collection finished, choose groupings beginning with Task or Frame to see performance data aggregated to your instrumented intervals. You'll also see your tasks/frames in timeline.

    As an example you could change your code like:

    <c language headers>
    #include "ittnotify.h"
    
    main() {
    
      __itt_domain* domain = __itt_domain_create("MyDomain");
      __itt_string_handle* task = __itt_string_handle_create("MyTask");
    
      some malloc() allocations and file reads into these buffers
    
      __itt_task_begin(domain, __itt_null, __itt_null, task);
    
      call to an assembly language routine that needs to be optimized to the maximum
    
      __itt_task_end(domain);
    
      write back the output of to files and do free()
      exit()
    }
    

    Don't forget to follow basic configuration to compile this code.