Search code examples
multithreadingshared-memorycpu-architecturecpu-cache

measure cycles spent in accessing remote cache


How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate value of cycles spent over a time interval


Solution

  • To get the average latencies (when a single thread is running) to various caches present on your machine, you can use memory profiler tools such as RMMA for windows (http://cpu.rightmark.org/products/rmma.shtml) and Lmbench for linux.

    You can also write your own benchmarks based on the ideas used by these tools. See the answers posted on this StackOverflow question: measuring latencies of memory Or Google for how the Lmbench benchmark works.

    If you want to find exact latencies for particular memory access patterns, you will need to use a simulator. This way you can trace a memory access as it flows through the memory system. However simulators will not model all the effects that are present in a modern processor or memory system.

    If you want to learn how multiple threads affect the average latency to L3, I think the best bet would be to write your own benchmark.