Is cuda::SURF_cuda faster than cv::xfeatures2d::SURF?

I'm trying to build opencv with CUDA support to compare cuda::SURF_CUDA with cv::xfeatures2d::SURF, but it's challenging.

However, suppose that I want to get SURF descriptors for an high performance, real time application. Yeah yeah, I know that FAST, or ORB are more suitable descriptors, but they're binaries and I need euclidean descriptors.

Anyway, the point is that I want to know which of these two implementations is faster given only one (query) image. I think it's important because someone told me that CUDA is reasonable to use only when a lot of images has to be processed, since the time to load them in the GPU memory becomes small compared to the time for computing descriptors, but I don't know if this is true.

Another reason because I post this is that I have only one NVIDIA GT755m, which is not an high-level GPU, and so my results could be not so good for this reason. On the other hand, I'm trying to improve the parallel section of cv::xfeatures2d::SURF (and test it on a Xeon Phi with 64 cores).

Solution

"the time to load them in the GPU memory becomes small compared to the time for computing descriptors" - OP

Yes you are correct. See here and here for explanations on why CUDA kernels seem to be slow on their 1st runs.

For your application, it will depend entirely on the CPU and GPU you're running the code on and how well the CPU and GPU code is written. Like @NAmorim said, it will be dependent on how much overhead your code creates and how much parallelism it is able to utilize.

Note that it could also depend on how many features you are processing as this factors into both CPU/GPU computation time along with a large portion of GPU overhead (think uploading/downloading the descriptors to the GPU).