Search code examples
ccachingcpu-cachemicro-optimizationlibuv

C optimization: conditional store to avoid dirtying a cache line


In the libuv source, I found this code:

  /* The if statement lets the compiler compile it to a conditional store.
   * Avoids dirtying a cache line.
   */
  if (loop->stop_flag != 0)
    loop->stop_flag = 0;

Can someone explain this a bit?

What exactly is a cache line?

Also, I guess a conditional store is some Assembler instruction which checks something and if succeeded, writes some value. Right?

When does such construct makes sense? I guess not always, because otherwise the compiler would just always use the conditional store, right?


Solution

  • A cache is organized in blocks of fast memory that, for historical reasons, are called lines. When you write to a cache line, it is marked as "dirty," meaning a bit is set in the cache controller hardware signifying that the line needs to be copied to other levels of cache and/or main memory before some other part of the system can access it.

    In general, each level of the memory hierarchy: registers, L1, L2, L3 ... cache, main memory, and swap space has different copies of the same information. Making sure that different parts of the system (processors, DMA, video subsystem, etc.) see the same value even though one or more copies may have changed is called the coherency problem.

    The general solution is pausing to copy an updated value to different levels of the hierarchy. This is called a flush.

    A flush can cost 10's to - in the worst case when it causes a page fault - perhaps millions of processor cycles.

    Due to this high cost, hardware designers go to great lengths to minimize the need for flushes. Here the programmer has taken up the cause, too.

    The comment is saying "if the cache already contains a zero in the flag, let's not write a zero over the zero because this will mark the cache line dirty, which may cause an unnecessary flush."

    "Conditional store" is a slightly obscure term. It just refers to a jump on zero around the normal store, which is the code a compiler will produce from the if statement. In X86 it will look something like:

        ;; assume edi holds value of pointer 'loop'
        ;; and flag is a constant offset for the 'stop_flag' field.
        cmp dword ptr [edi, flag], 0
        jz no_store
        mov [edi, flag], 0
    no_store:
       ... code continues
    

    With the if statement missing, you'd have only the last mov instruction.

    NB A commenter pointed out that there do exist single "conditional move/store" instructions on important processor architectures. I have not seen gcc produce one.

    Whether this is a worthwhile optimization is very debatable. Conditionals have their own risks of flushing the instruction pipeline (a different kind of flush). Never sacrifice clarity for speed without clear evidence that it's needed.