Search code examples
cachingcpu-architecturecpu-cache

hit ratio in cache - reading long sequence of bytes


Let assume that one row of cache has size 2^nB. Which hit ratio expected in the sequential reading byte by byte long contiguous memory?

To my eye it is (2^n - 1) / 2^n.

However, I am not sure if I am right. What do you think ?


Solution

  • yes, looks right for simple hardware (non-pipelined with no prefetching). e.g. 1 miss and 63 hits for 64B cache lines.


    On real hardware, even in-order single-issue (non-superscalar) CPUs, miss under miss (multiple outstanding misses) is usually supported, so you will see misses until you run out of load buffers. This makes memory accesses pipelined as well, which is useful when misses to different cache lines can be in flight at once, instead of waiting the full latency for each one.

    Real hardware will also have hardware prefetching. For example, have a look at Intel's article about disabling HW prefetching for some use-cases.

    HW prefetching can probably keep up with a one-byte-at-a-time loop on most CPUs, so with good prefetching you might see hardly any L1 cache misses.

    See Ulrich Drepper's What Every Programmer Should Know About Memory, and other links in the tag wiki for more about real HW performance.