Search code examples
cachinglanguage-agnosticprogramming-languages

Questions about cache


I have always wondered how i can control what is cached into the memory.

I always thought it was not possible to do with c++ atleast.

Until one day when a person told me not to include lua scripts in c++ application because it "...is notorious for completely ruining your cache...".

That got me thinking , is there anyway in c++ or any other compiled language to control what your program caches in memory.Because if lua can modify my cache performance, then why can't I.

If so,

i. Is it Architecture dependent or OS dependent ?

ii. Can you access what is in the cache or what is cached?.

Just to be Clear i am talking about CPU cache.


Solution

  • The CPU will cache all the data it needs and because its size is limited when it has to load something new it will drop anything that was the least recently used.

    Basically you don't have direct control over it, but indirectly you have some:

    What you have to know is that CPUs use cache lines. Each cache line is a small block of memory.

    So if the CPU needs some data it will fetch the whole block. So, if you have some data that is very frequently used and would normally be scattered in the memory you can put it for example inside a struct so the the effective usage of the CPU cache will better (you cache less things that aren't really needed). Note: 99.99% of the time you don't need these kind of optimizations.

    A more useful example is walking through a 2d array that doesn't fit into cache. If you walk it linearly you will load each cache line once, process it and some point later the CPU will drop it. If you use the indexes the wrong way each cache line will be loaded multiple times and because main memory access is slow, your code will be a lot slower. CPU can also do better prefetching if you walk linearly (direction doesn't matter).

    Cache peformance can also be ruined by calling some external library which needs a lot of data and/or code so you main program+data will be dropped from the caches and when the call finishes the CPU has to load it again.

    If you do heavy optimizations and want to know how you utilize the L1/L2/.. cache you can do simulations. Valgrind has an excellent module called Cachegrind which does exactly that.