Search code examples
profilinginteltbbinstructions

How to count TBB processing instructions?


Intel TBB suggests users to adjust grain size to about 10,000 to 100,000 processing instructions for most efficient parallelism. However, there isn't a guideline as to what counts as a processing instruction. Do I count summations, equalization, multiplication, comparison, etc.; and, if I do, what are the weights of these operations? Are there any profiling tool that count processing instructions the way that TBB means?


Solution

  • It is a very rough recommendation to give an idea what is the reasonable execution time of one piece of computation work. The idea is that the computation task should not be too small and there is no benefits from too large tasks. Usually, you do not need to worry about these rule if you use parallel algorithm with a default partitioner (auto_partitioner).

    In some cases (e.g. when you need to use simple_partitioner) you can measure the serial time of the algorithm and multiple it to a frequency of your CPU. This value can give you an idea about the number "instruction"/"clock ticks" of the whole problem. So you can divide the problem into pieces of the recommended size.

    As for the tools, I suppose there are many profiling tools that can calculate the execution time (or CPU instructions) of your application on a particular platform. (See List of performance analysis tools). In addition, you can try Intel VTune Amplifier that can estimate the overhead introduced by Intel TBB (the tool has a special support for TBB based applications) to understand if the application uses TBB efficiently.