Why is opencv dnn slower if I use Halide?

I am testing the performance of some samples in the opencv source tree depending on if halide is used or not.

Surprisingly, the performance is worse if halide is used for the computation:

squeezenet_halide: ~24ms with halide and ~16ms without halide.
resnet_ssd_face: ~84ms with halide and ~36ms without halide.

I have compiled halide and opencv following the instructions in this tutorial. The opencv code was downloaded from the master branch of the opencv git repository.

I have tested the performance using the sample files 'resnet_ssd_face.cpp' and 'squeezenet_halide.cpp'. In both cases I include one of these code lines just before call 'forward', to activate or deactivate halide:

net.setPreferableBackend(DNN_BACKEND_HALIDE);  // use Halide

net.setPreferableBackend(DNN_BACKEND_DEFAULT);   // NOT use Halide

The time is measured using this code just after the call to 'forward' function:

std::vector<double> layersTimings;
double freq = cv::getTickFrequency() / 1000;
double time = net.getPerfProfile(layersTimings) / freq;
std::cout << "Time: " << time << " ms" << std::endl;

Is there anything missed in the tutorial? Should Halide be compiled with different parameters?

My setup is:

OS: Linux (Ubuntu 16.04)
CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
GPU: nVidia GeForce GT 730 (Driver Version: 384.90)
Cuda: CUDA Version 9.0.176

Solution

Taking into account the comment by Dmitry Kurtaev and looking the wiki in the OpenCV GitHub account, I found a page where a benchmark comparing different approaches is included (I missed the links in the tutorial).

Also, there is a merge request where a similar benchmark is included.

In both of them, the time measurement shows that the performance using Halide is worse than with the original c++ approach.

I can assume that the Halide integration is in an early stage. Moreover, as Zalman Stern comments, the Halide scheduling is a work in progress and the original optimizations in dnn module of opencv could be more accurate than the included scheduling for Halide.

I hope this measures could change in future versions of OpenCV, but for now, this is the performance.