I have studied the cache and how to utilise it effectively for a few years now. I know all about the hierarchy of caches, how a block of cache is fetched according to cache line, how the prefetcher detects memory access patterns and fetches memory in advance according to it and even how caching works on threads and the pitfalls of caching in multi-threaded programs.
What I have never been able to find out after all this time is how caching works on a computer with multiple concurrently running processes. Over the years, I've realised that my programs are just another process being run alongside other processes in the computer. Even if my program is the only program being run, there will still be the OS running in the background.
With that being said, how do the caches work with multiple processes running concurrently? Are they shared between each of the processes or is the cached memory of one process evicted upon a context switch? Perhaps the answer is a bit of a hybrid of both?
Most CPUs are designed with caches that cache based on physical address, so they can still be hot after a context switch, even if a TLB invalidation requires a a page walk to find the right physical page for a virtual address.
If a process migrates to another CPU core, private L1 and L2 will be cold, but shared L3 will still be hot.