I must change a C/C++ program with a lot of loop inside one function. I must add cuda functions.
Before i start making changes, I wanted to take the time to all loops found. But i did't find any profiling programs who make exactly that. What is the best way to do that's. I on linux. if you have any solutions let me know.
here you will find an example of tool who makes exactly what i want but i haven't find it or something like that : http://carbon.ucdenver.edu/~dconnors/papers/wbia06-loopprof.pdf
I would use gperftools
, and figure out where the code is spending most of it's time. Once you have identified a function or part of a function, you're probably done. Understanding exactly which instructions are the "heaviest" in a function will require a long running testcase for that particular loop, so that the profiler can get sufficient data for each instruction (or at least most instructions) in the loop. But actually profiling down to instructions is probably not relevant if you are looking to replace the code with another technology - it is unlikely that replacing one loop of a few lines of code will help much, since there'd be too much overhead. Instead, you want to take a larger block and move that across to CUDA.