Search code examples
cachingcomputer-sciencecpu-architecturecpu-cache

how do we calculate the number of reads/misses of the cache in this code snippet?


Given this code snippet from this textbook that I am currently studying. Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson) (global edition, so the book's exercises could be wrong.)

for (i = 31; i >= 0; i--) {
    for (j = 31; j >= 0; j--) {
        total_x += grid[i][j].x;
    }
}

for (i = 31; i >= 0; i--) {
    for (j = 31; j >= 0; j--) {
         total_y += grid[i][j].y;
    }
}

and this is the information given

The heart of the recent hit game SimAquarium is a tight loop that calculates the average position of 512 algae. You are evaluating its cache performance on a machine with a 2,048-byte direct-mapped data cache with 32-byte blocks (B = 32).

  struct algae_position {
         int x;
         int y;
  };
 
  struct algae_position grid[32][32];
  int total_x = 0, total_y = 0;
  int i, j;

You should also assume the following:

  • sizeof(int) = 4.
  • grid begins at memory address 0.
  • The cache is initially empty.
  • The only memory accesses are to the entries of the array grid.
  • Variables i, j, total_x, and total_y are stored in registers

The book gives the following questions as practice:

A. What is the total number of reads?    
Answer given : 2048  
B. What is the total number of reads that miss in the cache?
Answer given : 1024    
C. What is the miss rate?  
Answer given: 50%
  1. I'm guessing for A, the answer is derived from 32*32 *2? 32*32 for the dimensions of the matrix and 2 because there are 2 separate loops for x and y vals. Is this correct? How should the total number of reads be counted?

  2. How do we calculate the total number of misses that happen in the cache and the miss rate? I read that the miss rate is (1- hit-rate)


Solution

  • Question A

    You are correct about 32 x 32 x 2 reads.

    Question B

    The loops counts down from 31 towards 0 but that doesn't matter for this question. The answer is the same for loops going from 0 to 31. Since that is a bit easier to explain I'll assume increasing loop counters.

    When you read grid[0][0], you'll get a cache miss. This will bring grid[0][0], grid[0][1], grid[0][2] and grid[0][3] into the cache. This is because each element is 2x4 = 8 bytes and the block size is 32. In other words: 32 / 8 = 4 grid elements in one block.

    So the next cache miss is for grid[0][4] which again will bring the next 4 grid elements into the cache. And so on... like:

    miss
    hit
    hit
    hit
    miss
    hit
    hit
    hit
    miss
    hit
    hit
    hit
    ...
    

    So in the first loop you simply have:

    "Number of grid elements" divided by 4.
    

    or

    32 * 32 / 4 = 256
    

    In general in the first loop:

    Misses = NumberOfElements / (BlockSize / ElementSize)
    

    so here:

    Misses = 32*32 / (32 / 8) = 256
    

    Since the cache size is only 2048 and the whole grid is 32 x 32 x 8 = 8192, nothing read into the cache in the first loop will generate cache hit in the second loop. In other words - both loops will have 256 misses.

    So the total number of cache misses are 2 x 256 = 512.

    Also notice that there seem to be a bug in the book.

    Here:

    The heart of the recent hit game SimAquarium is a tight loop that calculates the
    average position of 512 algae.
                        ^^^
                        Hmmm... 512 elements...
    

    Here:

    for (i = 31; i >= 0; i--) {
        for (j = 31; j >= 0; j--) {
             ^^^^^^
             hmmm... 32 x 32 is 1024
    

    So the loop access 1024 elements but the text says 512. So something is wrong in the book.

    Question C

    Miss rate = 512 misses / 2048 reads = 25 %
    

    note:

    Being very strict we cannot say for sure that the element size is two times the integer size. The C standard allow that structs contain padding. So in principle there could be 8 bytes padding in the struct (i.e. element size being 16) and that would give the results that the book says.