In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program will touch in future, so that when I perform calculations on that portion, the data is already in the cache.
Can someone give me a simple example of how to achieve this with gcc? I read _mm_prefetch
somewhere, but don't know how to properly use it. Also note that I have a multicore system, but each core will be working on a different region of memory in parallel.
gcc
uses builtin functions as an interface for lowlevel instructions. In particular for your case __builtin_prefetch
. But you only should see a measurable difference when using this in cases where the access pattern is not easy to predict automatically.