I want to optimize my local memory access pattern within my OpenCL kernel. I read at somewhere about configurable local memory. E.g. we should be able to configure which amount is used for local mem and which amount is used for automatic caching.
Also i read that the bank size can be chosen for the latest (Kepler) Nvidia hardware here: http://www.acceleware.com/blog/maximizing-shared-memory-bandwidth-nvidia-kepler-gpus. This point seems to be very crucial for double precision value storing in local memory.
Does Nvidia provide the functionality of setting up the local memory exclusively for CUDA users? I can't find similar methods for OpenCL. So is this maybe called in a different way or does it really not exist?
Unfortunately there is no way to control the L1 cache/local memory configuration when using OpenCL. This functionality is only provided by the CUDA runtime (via cudaDeviceSetCacheConfig
or cudaFuncSetCacheConfig
).