Search code examples
cpu-architecturecpu-cache

Why data is fetched from main memory in Write Allocate cache policy


For write allocate policy of a cache, when a write miss occurs, data is fetched from main memory and then updated with a write hit.

My question is, assuming write back policy on write hits, why the data is read from the main memory if it is immediately being updated by the CPU? Can't we just write to the cache without fetching the data from main memory?


Solution

  • On a store that hits in L1d cache, you don't need to fetch or RFO anything because the line is already exclusively owned.

    Normally you're only storing to one part of the full line, thus you need a copy of the full line to have it in Modified state. You need to do a Read For Ownership (RFO) if you don't already have a valid Shared copy of the line. (Which you could promote to Exclusive and then Modified via just invalidating other copies. MESI).

    A full-line store (like x86 AVX-512 vmovdqa [rdi], zmm0 64-byte store) can just invalidate instead of Read For Ownership, and just wait for an acknowledgement that no other cores have a valid copy of the line. IDK if that actually happens for AVX-512 stores specifically in current x86 microarchitectures.

    Skipping the read (and just invalidating any other copies) definitely does happen in practice in some CPUs in some cases. e.g. in the store protocol used by microcode to implement x86 rep stos and rep movs, which are basically memset / memcpy. So for large counts they are definitely storing full lines, and it's worth it to avoid the memory traffic of reading first. See Andy Glew's comments, which I quotes in What setup does REP do? - he designed P6's (Pentium Pro) fast-strings microcode when he was working at Intel, and says it included a no-RFO store protocol.

    See also Enhanced REP MOVSB for memcpy