Search code examples
algorithmperformancestencilsintel-advisorroofline

Roofline model - how to calculate flop/byte ratio?


I would like to create roofline model and i have problem with algorithm flop per byte ratio. Can You explain me how to calculate it? The algorithm do computation using 5-point stencil.

Here's algorithm

for(int i=1; i<m-1; ++i) {
   for(int j=1; j<n-1; ++j) {
       outMax[i][j] = max( inMax[i][j], inMax[i][j-1], inMax[i][j+1],
                           inMax[i-1][j], inMax[i+1][j] );
   }
}
swap(inMax, outMax)

for(int i=1; i<m-1; ++i) {
   for(int j=1; j<n-1; ++j) {
      outMin[i][j] = min( inMin[i][j], inMin[i][j-1], inMin[i][j+1],
                          inMin[i-1][j], inMin[i+1][j] );
   }
}
swap(inMax, outMax)

Solution

  • Normally, roofline is per-loop or per-program. So I would probably consider flop/byte for the 1st loop and, separately, for the second loop.

    For each loop:

    1. you need to estimate number of operations, which (for roofline and for flop/byte arithmetic intensity) normally equal to number of all ALU (multiplications, additions, divisions etc) operations, executed in single iteration of your loop. (in terms of hardware instructions you have to account operations which don't lead to generating MOV* or jump instructions). In your case you only have to account number of comparisons (since min/max are dealing with comparisons). Exact number of comparisons in your case depends on min()/max() function implementation.

    2. you have to estimate how many bytes do you read and write from/to inMax (or in second case from/to inMin); again you do it per single iteration. In your case you definitely read 5*sizeof(double) == 40 bytes. And you write at least one double. How do you read/write memory inside of min()/max() function depends on its implemenation.

    3. you have to divide these 2 values by each other. In your case flop/byte will maybe something like 0.1, depends on min()/max() algorithm.

    In the meantime Automatic Roofline model generation (along with flop/byte metrics) for each loop and function in C/C++/Fortran program is available as a first-class feature in Intel Advisor product startig from its 2017 version, see https://software.intel.com/en-us/articles/intel-advisor-roofline, https://www.codeproject.com/Articles/1169323/Intel-Advisor- Intel Advisor Roofline: each circle corresponds to some loop/function; flop/byte ration is on horizontal axis

    Take in mind that some roofline variations differ in terms of how do they define "byte" value.

    Figuring out flop/byte and roofline model for stencil is very popular topic among roofline experts and developers. Thus by looking at links below you may find enough stencil roofline examples to follow and to re-apply in your particular case, either accouting let's say DRAM vs L1 or not:

    http://icsc2014.sjtu.edu.cn/wp-content/uploads/2014/05/Tutorial-Leopold1.pdf (especially, starting from p 17)

    http://blogs.fau.de/hager/files/2014/05/Roofline_ECM_SPPEXA_PhD_2014.pdf