Search code examples
cachingcpu-cache

Cache miss rate of array


I'm trying to figure out how to calculate the miss rate of an array. I have the answer, but I'm not understanding how the answer was arrived at. I have the following code:

int C[N1][N2];
int A[N1][N3];
int B[N3][N2];

initialize_arrays(A, B, C, N1, N2, N3);

for(i=0; i<N1; ++i)
   for(j=0; j<N2; ++j)
      for(k=0; k<N3, ++k)
         C[i][j] += A[i][k] * B[k][j];

I also have the following info: N1=N2=N3=2048 (what does this mean??). The processor has an L1 data cache of 32kB with line size of 64B (no L2 cache). (what is line size?)

I know the miss rate of array C is (N^2/16)/N^3. I know the formula is (total misses)/(total accesses). I see that there are N^3 total accesses, but how did they get the total misses?

I also know the miss rate of array B: (N^3)/(N^3) and A: (N^2/16)/N^3. Could someone please explain to me how they got the total misses here too?


Solution

  • Cache is always accessed / managed with the granularity of a line. Here the line size is 64B. That means that in one cache line there will be 16 elements in one cache line (64B / 4B = 16). Next thing that is important here is every 17th element is in a new line, and once a line is brought to cache, it can possibly give 15 hits.

    Having got that out of the way, let us look at the access pattern. A[0][0], B[0][0], C[0][0], A[0][1], B[1][0], C[0][0], A[0][2], B[2][0], C[0][0], ...... A[0][16], B[16][0], C[0][0], A[0][17], B[17][0], C[0][0], ...... and so on

    Now Since a line has 16 elements and it is safe to assume a row-major placement of elements in the memory A[0][0] - A[0][15] will belong to one the first cache line (let us call this AL1) and similarly let B[0][0] - B[0][15] belong to line BL1. From this information we can write the access pattern in terms of cache lines. AL1, BL1, CL1, AL1, BL129, CL1, AL1, BL257, CL1, ..... AL1, BL2049, CL1, AL2, BL2176, CL1,.... and so on.

    Now the cache size is 32kB that means the cache can hold 512 lines. Since this is L1, let us assume it is a two-way associative cache. Therefore there are 256 sets. This means that after every 256 lines of an array, the 257th line maps to the same set as the 1st line.

    The cache contents upon access to lines mapping to set 1 looks like this

    1st Iteration - inner

    Access to - AL1 - Miss
     AL1 
    Access to - BL1 - Miss
     AL1
     BL1
    Access to - CL1 - Miss
     CL1
     BL1
    

    2nd Iteration - inner

    Access to - AL1 - Miss
     CL1
     AL1
    Access to - CL1 - Hit
     CL1
     AL1
    

    3rd Iteration - inner

    Access to - AL1 - Hit
     CL1
     AL1
    Access to - BL257 - Miss
     BL257
     AL1
    Access to - CL1 - Miss
     BL257
     CL1
    

    4th Iteration - Inner

    Access to - AL1 - Miss
     AL1
     CL1
    Access to - CL1 - Hit
     AL1
     CL1
    

    5th Iteration - Inner

    Access to - AL1 - Hit
     CL1
     AL1
    Access to - BL513 - Miss
     BL513
     AL1
    Access to - CL1 - Miss
     BL513
     CL1
    

    . . .

    16th Iteration - Inner

    Access to - AL1 - Miss
     AL1
     CL1
    Access to - CL1 - Hit
     AL1
     CL1
    

    17th Iteration - Inner

    Access to - BL2049 - Miss
     BL2049
     CL1
    Access to - CL1 - Hit
     BL2049
     CL1
    

    18th Iteration - Inner

    Access to - CL1 - Hit
     BL2049
     CL1
    

    For the 1st iteration of the middle loop. We have for set 1,

    A -> M, M, H, M, H, M, H, M, H, M, H, M, H , M, H, M, ~ , ~ , ~ , ~ , ....... 
    => after 2048 iterations - 7 hits 9 misses, 16 accesses
    
    B -> M, ~ , M, ~ ,  M, ~ , ........
    => after 2048 iterations - 0 Hits, 1024 misses, 1024 accesses
    
    C -> M, H, M, H, M, H, M, H, M, H, M, H, M, H, M, H, H , H, H , H, ......
    => after 2048 iterations - 2040 hits, 8 misses, 2048 accesses
    

    for 2nd - 16th iteration of the middle loop,

    Exact hit / miss / access pattern as before.
    

    for 17th - 2048th iteration of the middle loop,

    A -> same as before
    B -> No accesses
    C -> No accesses
    

    To summarize - for 1 iteration of the outer loop, we have for set 1,

    A -> N*9 misses , N * 16 accesses
    B -> N/2 * 16 Misses , N/2 * 16 accesses
    C -> 8 * 16 Misses , N * 16 accesses
    

    In the outer loop, we can see that every alternate iteration will have the same hit / miss / access pattern as the 1st iteration (summarized above) for arrays C and A. Every iteration of the outer loop will have the same hit / miss / access pattern a the 1st iteration for array B.

    Hence, (Finally!)

    For Set 1, at the end of the program, we have

    A -> N^2 * 9 / 2 misses, N^2 * 8 accesses
    B -> N^2 * 8 misses, N^2 * 8 accesses
    C -> N * 64 misses, N^2 * 8 accesses
    

    The access pattern will be similar across all sets. Therefore by the end of the program we have, across all sets

    A -> N^2 * 1152 misses, N^3 accesses
    B -> N^3 misses, N^3 accesses
    C -> N^2 * 8 misses, N^3 accesses
    

    I know this is different from what is indicated in the question. I could not figure out how. I will be glad to hear other responses.