Search code examples
cachingcpu-architecturecpu-cache

The number of cache hits and cache misses in terms of L1, L2 and L3 caches


The following code will run on a CPU with the following cache structure:

  • L1 cache: 1KB

  • L2 cache: 8KB

  • L3 Cache: 64KB

  • Block size: 16B

     unsigned int A[65536];
     int i,j;
     for (i=0 ; i<65536-256 ; i++)
     for (j=1 ; j<128; j++)
     A[i]=A[i]+A[i+j];
    

I am studying for my midterm exam there is a question. Modify this code to minimise the number of penalties. Calculate the number of cache hits and cache misses in terms of L1, L2 and L3 caches.

I try to solve it via Loop Interchange. If I change to code like below, so there will be a sequential accesses instead of striding through memory every 16,364 word.

     unsigned int A[65536];
     int i,j;
     for (j=1 ; j<128; j++)
     for (i=0 ; i<65536-256 ; i++)
     A[i]=A[i]+A[i+j]; 

but I stuck with the cache hits and misses. Can someone explain to me?


Solution

  • Assuming unsigned int is 4 bytes, the size of your array is 4 * 65536 = 256KB. Your L3 cache can only hold upto 64KB.

    In order to minimize penalties, the first thing you should do is to break the loop into 4 sub groups, so that once you load an entry to L3, you should use it completely before being evicted.

    unsigned int A[65536];
    int i,j,k;
    for (k=0 ; k<65536-256; k+=16384)
        for (j=1 ; j<128; j++)
            for (i=k ; i<MIN(k+16384,65536-256) ; i++) //define a MIN function to return minimum
                A[i]=A[i]+A[i+j]; 
    

    Now to calculate cache hits and misses, a cacheline can hold 4 entries of the array. When you access A[0] for the first time, it will be a miss in L1, L2 and L3. When fetched from memory, you don't just fetch A[0], you will also fetch A[1], A[2] and A[3] as a cacheline can hold all 4 of them.

    In the same instruction, you also access A[i+j], in this case will be A[1] which will be a hit. So it goes like,

    First iteration
        A[i]   - A[0] - Miss
        A[i+j] - A[1] - Hit
    Second Iteration
        A[i]   - A[1] - Hit
        A[i+j] - A[2] - Hit
    Third Iteration
        A[i]   - A[2] - Hit
        A[i+j] - A[3] - Hit
    Forth Iteration
        A[i]   - A[3] - Hit
        A[i+j] - A[4] - Miss // This will cause to fetch A[4], A[5], A[6], A[7]
    

    An the pattern continues until L1 is filled.