Why does my CUDA kernel execution time increase with successive launches?

I'm prototyping an application with CUDA. I've been benchmarking it against the CPU and noticed some variable runtimes. I decided to run my application in a loop from the command line so I could gather some better statistics. I ran the application 50 times and recorded the results. I was very surprised to see that the elapsed kernel time was increasing as a function of launch number.

GPU Kernel Time Increasing With Run Number

Here is a snippet so you can see the part of the code that is being timed:

int nblocks = (int)ceil((float)n / (float)NUM_THREADS);

gpuErrchk(cudaEventRecord(start, 0));
gpuperfkernel << <nblocks, NUM_THREADS >> >(dmetadata, ddatax, ddatay);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
gpuErrchk(cudaEventRecord(stop, 0));
gpuErrchk(cudaEventSynchronize(stop));

gpuErrchk(cudaEventElapsedTime(&milliseconds, start, stop));
printf("GPU kernel took %f milliseconds.\n", milliseconds);
gpuelapsed += milliseconds;

I've worked with CUDA quite a bit and I haven't seen this behavior before. Wondering if anyone has noticed this? My platform is Windows 10, CUDA 7.5, MSI notebook, GeForce 970m.

Since I'm on a laptop I was thinking it might be a power related setting or something like that, but I have everything set to high performance and have disabled the screen saver.

Solution

The GeForce 970m has boost clocks. Run after run, temperature of your GPU rises and most probably the boost is less likely to be at its top level when temperature increases.

You can monitor the GPU temperature with nvidia-smi. There is also a monitoring API. Your boost settings should also be configurable in nvidia-smi to some extent, should you want to verify this.

To disable auto boost via nvidia-smi use this command:

sudo nvidia-smi --auto-boost-default=DISABLED