opencv computer-vision time-complexity sift

Why SIFT costs more time with fewer octave layers?

I'm using SIFT feature detector in OpenCV 4.5.2. By tuning the nOctaveLayers parameter in cv::SIFT::create(), I get these results from detectAndCompute():

nOctaveLayers	KeyPoints	Time Cost (ms)
1	1026	63.41
2	1795	45.07
3	2043	45.74
4	2173	47.83
5	2224	51.86

To my understanding, there should be less computation with fewer octave layers, but why SIFT costs significantly more time with only 1 octave layer?

I also tested detect() and compute() separately, and they both cost more time when nOctaveLayers is 1, which confuses me a lot.

The test image is here (from TUM open dataset). Thanks ahead for any help.

[Edit for @Micka] My test code:

const int test_num = 100;
const int layers = 5;
cout << "layers: " << layers << endl;

auto sift = SIFT::create(0, layers);
vector<KeyPoint> kps;
Mat descs;

auto t1 = chrono::high_resolution_clock::now();
for (int i = 0; i < test_num; ++i)
    sift->detectAndCompute(img_src, noArray(), kps, descs);
auto t2 = chrono::high_resolution_clock::now();
cout << "num of kps: " << kps.size() << endl;
cout << "avg time cost: " << chrono::duration<double>(t2 - t1).count() * 1e3 / test_num << endl;

For each nOctaveLayers configuration, I change layers value in the code, recompile & run & record the result.

Solution

After hours of profiling, I finally found out the reason: GaussianBlur.

The pipeline of SIFT algorithm is:

Create initial image: convert data type of source image to float, double the resolution, and do GaussianBlur (sigma=1.56)
Build gaussian pyramid
Find key points: build DoG pyramid and find scale space extrema
Calculate descriptors

The num of octaves is calculated according to image resolution (see here). And nOctaveLayers controls num of layers (nOctaveLayers + 3 for GaussianPyramid) in each octave.

Indeed, when nOctaveLayers increases, nums of layers and keypoints both increase. As a result, time cost of step 3 & 4 increases. However, in parallel computation, this time increment is not very remarkable (several milliseconds).

In contrast, the step 2 costs more than half of the total time. It costs 25.27 ms (in 43.49 ms) when nOctaveLayers is 3, and 51.16 ms (in 63.10 ms) when nOctaveLayers is 1. So, why is this happening?

Because the sigma for GaussianBlur() increases faster when layers are fewer, and it's critical for the time consumed by GaussianBlur(). See test below:

vector<double> sig1 = { 1.6, 2.77128, 5.54256, 11.0851 };
vector<double> sig3 = { 1.6, 1.22627, 1.54501, 1.94659, 2.45255, 3.09002 };
vector<double> sig5 = { 1.6, 0.9044, 1.03888, 1.19336, 1.37081, 1.57465, 1.8088, 2.07777 };

auto blurTest = [](const vector<double>& sigs, const string& label) {
    const int test_num = 100;
    auto t1 = chrono::high_resolution_clock::now();
    for (int i = 0; i < test_num; ++i) {
        vector<Mat> pyr;
        pyr.resize(sigs.size());
        pyr[0] = Mat::zeros(960, 1280, CV_32FC1);
        for (size_t i = 1; i < sigs.size(); ++i)
            GaussianBlur(pyr[i - 1], pyr[i], Size(), sigs[i], sigs[i]);
    }
    auto t2 = chrono::high_resolution_clock::now();
    auto time = chrono::duration<double>(t2 - t1).count() * 1e3 / test_num;
    cout << label << ": " << time << " ms\n";
};

blurTest(sig1, "1");
blurTest(sig3, "3");
blurTest(sig5, "5");

/* output:
1: 45.3958 ms
3: 28.5943 ms
5: 31.4827 ms
*/

The code above simulates buildGaussianPyramid() when nOctaveLayers is 1, 3, and 5. The sigma values are from cv::SIFT calculation. This explains why SIFT costs much more time when nOctaveLayers is 1.