Is cache invalidation promised in this implementation

Consider the following code:

volatile uint32_t word;
for (i=0; i<10; i++)
{
    word = *(uint32_t *)(ADDRESS_IN_MEMORY);
    printf("%"PRIu32, word);
    some_function_compiled_in_other_object();  /* this function may or may not change memory content at adress ADDRESS_IN_MEMORY */
}

So, since word is volatile, we know that word = *(uint32_t *)(ADDRESS_IN_MEMORY) will be indeed executed 10 times. But, is there any promise regarding the system cache here? I would expect that the compiled code will invalidate ADDRESS_IN_MEMORY before\after each read from this address, so that word will be loaded with the value from system memory and not the cache. Is that promised?

Does the answer depends on whether or not compiler knows about some_function_compiled_in_other_object changing value at memory address ADDRESS_IN_MEMORY?

Solution

The C standard knows nothing of cache memories. They are an application-specific detail outside the scope of the C language.

The volatile keyword is only concerned with optimizations performed by the compiler. The compiler needs to ensure that operations on volatile-qualified variables are sequenced in a certain order and not optimized away.

When reading a hardware register, you must always use volatile or otherwise the compiler can assume that the contents of the register are never changed since previous use.

So if ADDRESS_IN_MEMORY in your example is a number corresponding to an address, you have a bug there, since you read it as *(uint32_t *)(ADDRESS_IN_MEMORY);. This bug isn't the slightest related to cache memory.

Cache memory handling is handled by the CPU/branch prediction, not by the compiler nor the C language. And so you cannot affect it directly from application code, unless you access the MMU registers where you can specify the behavior. It is of course very system-specific. A sound system setup will not load memory-mapped hardware register access into data cache.

You can however write cache-friendly code, by accessing memory consecutively, always reading the next adjacent address from top to bottom, without any branches that can change access order.