Search code examples
linux-kernelarmraspberry-pi3

Disabling cache in a kernel module on a Raspberry Pi 3?


On a Raspberry Pi 3B+ (ARMv8), how can I disable cache (or use other methods) in a kernel module so that when I read from a memory address, its value (and the page it's in) are not cached? It would be even better if all memory reads bypass cache so that even if the address content is already cached before the first read, it's ignored.
In other words, let's say p is a pointer to memory allocated with kmalloc and I have the following loop,

for (i=0;i<10;i++)
   mem = p[i]

My goal is to make sure all the addresses are directly read from RAM and not the caches.


Solution

  • If it's just that caching shouldn't be forced here, but is allowed, you could use non-temporal hints to achieve this:

    6.3.8 Non-temporal load and store pair

    A new concept in ARMv8 is the non-temporal load and store. These are the LDNP and STNP instructions that perform a read or write of a pair of register values. They also give a hint to the memory system that caching is not useful for this data. The hint does not prohibit memory system activity such as caching of the address, preload, or gathering. However, it indicates that caching is unlikely to increase performance. A typical use case might be streaming data, but take note that effective use of these instructions requires an approach specific to the microarchitecture.

    From The Programmer’s Guide for ARMv8-A.

    The syntax of this two instructions is

    LDNP <Wt1>, <Xt2>, [<Xn|SP|{, #<imm>}]
    LDNP <Xt1>, <Xt2>, [<Xn|SP|{, #<imm>}]
    

    These are instructions on register pairs, but the immediate can also have the width of a single register (then simply use WZR or XZR as the second register). As the description says, it may still be that the cache is used here. However, this method can be used directly from user mode.


    From a kernel module there would also be the option of marking the corresponding memory area as non-cacheable. This is normally used for memory-mapped I/O devices, in which current values should always be fetched directly from the memory. That should be configured in the MMU translation tables. This seems to be the best solution for your case to me, but it requires some knowledge of the MMU and the paging mechanism (and therefore I would not like to describe it further here).

    Finally there would be the possibility to deactivate the entire cache, but there is already a post about this and the consequences here: Disable CPU caches (L1/L2) on ARMv8-A Linux.