Search code examples
c++cmemory-profilingcachegrind

What is the price of a cache miss


I'm analyzing some code and using cachegrind to get the number of cachemisses(L2 and L3) in the execution.

My question is how do I determine the time spend waiting for the cache to get readdy based on the cache misses?

I would like to be able to say something like, "my code get 90% cpu utilization"

is it posible to do this based on the cache grind output?


Solution

  • Cachegrind simply simulates execution on a CPU, emulating how the cache and branch predictor might behave. To be able to know how long you would spend blocking on the cache would require a lot more information. Specifically you need to know when execution can be speculated and how many instructions can be dispatched in parallel (as well as how memory memory accesses can be coordinated simultaneously). Cachegrind can't do this, and any tool that could would depend heavily on the processor (whereas cache misses are much less processor dependent).

    If you have access to a modern Intel CPU I'd recommend getting a free copy of VTune (for non-commercial purposes) and seeing what it says. It can tell the processor to collect data on cache misses and will report it back to you, so you can see what actually happened rather then just simulating. It will give you a clocks per instruction for each line of code, and using this you can see which lines are blocking on the cache (and how long for), it can also give you all the other information cachegrind can.

    You can get it here:

    http://software.intel.com/en-us/articles/non-commercial-software-download/