What cache invalidation algorithms are used in actual CPU caches?

I came to the topic caching and mapping and cache misses and how the cache blocks get replaced in what order when all blocks are already full.

There is the least recently used algorithm or the fifo algorithm or the least frequently algorithm and random replacement, ...

But what algorithms are used on actual cpu caches? Or can you use all and the... operating system decides what the best algorithm is?

Edit: Even when i chose an answer, any further information is welcome ;)

Solution

As hivert said - it's hard to get a clear picture on the specific algorithm, but one can deduce some of the information according to hints or clever reverse engineering.

You didn't specify which CPU you mean, each one can have a different policy (actually even within the same CPU different cache levels may have different policies, not to mention TLBs and other associative arrays which also may have such policies). I did find a few hints about Intel (specifically Ivy bridge), so we'll use this as a benchmark for industry level "standards" (which may or may not apply elsewhere).

First, Intel presented some LRU related features here - http://www.hotchips.org/wp-content/uploads/hc_archives/hc24/HC24-1-Microprocessor/HC24.28.117-HotChips_IvyBridge_Power_04.pdf

Slide 46 mentioned "Quad-Age LRU" - this is apparently an age based LRU that assigned some "age" to each line according to its predicted importance. They mention that prefetches get middle age, so demands are probably allocated at a higher age (or lower, whatever survives longest), and all lines likely age gradually, so the oldest gets replaced first. Not as good as perfect "fifo-like" LRU, but keep in mind that most caches don't implement that, but rather a complicated pseudo-LRU solution, so this might be an improvement.

Another interesting mechanism mentioned there, which goes the extra mile beyond classic LRU, is adaptive fill policy. There's a pretty good analysis here - http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ , but in a nutshell (if the blog is correct, and he does seem to make a good match with his results), the cache dynamically chooses between two LRU policies, trying to decide whether the lines are going to be reused or not (and should be kept or not).

I guess this could answer to some extent your question on multiple LRU schemes. Implementing several schemes is probably hard and expensive in terms of HW, but when you have some policy that's complicated enough to have parameters, it's possible to use tricks like dynamic selection, set dueling , etc..