cpu cache performance. store misses vs load misses

I'm using perf as basic event counter. I'm working on a program which suffers from data cache store misses. Which has as ratio as high as %80.

I know how caches in principle work. It loads from memory on various miss cases, removes data from cache when it pleases. What I don't understand is: what is the difference between store and load misses. How does it differ from loading and storing. How can you store-miss ?

Solution

A load-miss (as you know) is referring to when the processor needs to fetch data from main memory, but data does not exist in the cache. So whenever the processor wants some data from the main memory, it esquires the cache, and if the data is already loaded you get a load-hit and otherwise you get a load-miss.

A store-miss is related to when the processor wants to write back the newly calculated data to the main memory.When it wants to write-back the data to the main memory, it hasto make sure that the content of the cache and main memory are in sync with each other. It can happen with two different policies that you can find here: Writing Policies.

So no matter what policy you choose, you first need to check whether the data is already in the cache so you can store it to cache first (since it's faster), and if the data block you are looking for has been evicted from the cache, you get a store-miss related to that cache.

You can check the applet here, to get a better idea of what happens in different scenarios.