How do x86 instructions to read/write data from memory interact with the L1 and L2 caches?

Let's say I have an instruction like this in x86 which would like to read data from an address in memory

mov eax, word_123456

Presumably this will fetch the data from memory. Now let's say I store it

mov word_123456, eax

I know from CPU architecture diagrams that there are caches in between random access memory and the CPU. If I ask to store the contents of a register in memory, does it always go to the L1 cache first? Who decides which cache it ends up in? Also, I'm curious if you can write/hint your x86 commands to specify whether a move operation should be stored in the cache or is going to be a rare read/write, etc.

Solution

By default, everything will go into both the L1 and L2 caches. (I'm simplifying slightly WRT atomic accesses, but if you're just doing a mov, that's the deal.) It's not really that it goes into the L1 cache "first", so much as that once you've read it into a register, the cache line is also cached for later.

(I'm also getting a little architecture-specific here. SOME architectures choose to make the two caches exclusive, such that an L2 cache line is removed from the L2 cache to put it into the L1 cache. But this doesn't have a huge effect on code performance, simply because the L2 cache is so much larger than the L1 cache. It's more a bookkeeping thing.)

The purpose of the L2 cache is to be bigger than the L1 cache, such that if something was in the L1 cache but has since been evicted, hopefully it's still in the L2 cache and doesn't require going all the way to the RAM.

And yes, you can hint your writes to bypass the cache. This is the purpose of, for instance, movnti. Don't bother manually using movnti for all your write-only accesses, though. The practical performance benefit is small, and even if your current function isn't reading back from the memory, there's a decent chance some other soon-to-be-executed code will.