Search code examples
cachingcpu-architecturelatencycpu-cache

How long does it take to fill a cache line?


Assuming a cache line is 64 bytes,

100 nanoseconds is the often quoted figure for main memory access, is this figure for 1 byte at a time or for 64 bytes at a time?


Solution

  • It's for a whole cache line, of course.

    The busses / data-paths along the way are at least 8 bytes wide at every point, with the external DDR bus being the narrowest. (Possibly also the interconnect between sockets on a multi-core system.)

    The "critical word" of the cache line might arrive a cycle or two before the rest of it on some CPUs, maybe even 8 on an ancient Pentium-M, but on many recent CPUs the last step between L2 and L1d is a full 64 bytes wide. To make best use of that link (for data going either direction), I assume the L2 superqueue waits to receive a full cache line from the 32-byte ring bus on Intel CPUs, for example.

    Skylake for example has 12 Line Fill Buffers, so L1d cache can track cache misses on up to 12 lines in flight at the same time, loads+stores. And the L2 Superqueue has a few more entries than that, so it can track some additional requests created by hardware prefetching. Memory-level parallelism (as well as prefetching) is very important in mitigating the high latency of cache misses, especially demand loads that miss in L3 and have to go all the way to DRAM.


    For some actual measurements, see https://www.7-cpu.com/cpu/Skylake.html for example, for Skylake-client i7-6700 with dual-channel DDR4-2400 CL15.

    Intel "server" chips, big Xeons, have significantly higher memory latency, enough that it seriously reduces the memory (and L3) bandwidth available to a single core even if the others are idle. Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?

    Although I haven't heard if this has improved much with Ice Lake-server or Sapphire Rapids; it was quite bad when they first switched to a mesh interconnect (and non-inclusive L3) in Skylake-server.