I'm using SIFT feature detector in OpenCV 4.5.2. By tuning the nOctaveLayers
parameter in cv::SIFT::create()
, I get these results from detectAndCompute()
:
nOctaveLayers | KeyPoints | Time Cost (ms) |
---|---|---|
1 | 1026 | 63.41 |
2 | 1795 | 45.07 |
3 | 2043 | 45.74 |
4 | 2173 | 47.83 |
5 | 2224 | 51.86 |
To my understanding, there should be less computation with fewer octave layers, but why SIFT costs significantly more time with only 1 octave layer?
I also tested detect()
and compute()
separately, and they both cost more time when nOctaveLayers
is 1, which confuses me a lot.
The test image is here (from TUM open dataset). Thanks ahead for any help.
[Edit for @Micka] My test code:
const int test_num = 100;
const int layers = 5;
cout << "layers: " << layers << endl;
auto sift = SIFT::create(0, layers);
vector<KeyPoint> kps;
Mat descs;
auto t1 = chrono::high_resolution_clock::now();
for (int i = 0; i < test_num; ++i)
sift->detectAndCompute(img_src, noArray(), kps, descs);
auto t2 = chrono::high_resolution_clock::now();
cout << "num of kps: " << kps.size() << endl;
cout << "avg time cost: " << chrono::duration<double>(t2 - t1).count() * 1e3 / test_num << endl;
For each nOctaveLayers
configuration, I change layers
value in the code, recompile & run & record the result.
After hours of profiling, I finally found out the reason: GaussianBlur
.
The pipeline of SIFT algorithm is:
float
, double the resolution, and do GaussianBlur
(sigma=1.56)The num of octaves is calculated according to image resolution (see here). And nOctaveLayers
controls num of layers (nOctaveLayers + 3
for GaussianPyramid) in each octave.
Indeed, when nOctaveLayers
increases, nums of layers and keypoints both increase. As a result, time cost of step 3 & 4 increases. However, in parallel computation, this time increment is not very remarkable (several milliseconds).
In contrast, the step 2 costs more than half of the total time. It costs 25.27 ms (in 43.49 ms) when nOctaveLayers
is 3, and 51.16 ms (in 63.10 ms) when nOctaveLayers
is 1. So, why is this happening?
Because the sigma for GaussianBlur()
increases faster when layers are fewer, and it's critical for the time consumed by GaussianBlur()
. See test below:
vector<double> sig1 = { 1.6, 2.77128, 5.54256, 11.0851 };
vector<double> sig3 = { 1.6, 1.22627, 1.54501, 1.94659, 2.45255, 3.09002 };
vector<double> sig5 = { 1.6, 0.9044, 1.03888, 1.19336, 1.37081, 1.57465, 1.8088, 2.07777 };
auto blurTest = [](const vector<double>& sigs, const string& label) {
const int test_num = 100;
auto t1 = chrono::high_resolution_clock::now();
for (int i = 0; i < test_num; ++i) {
vector<Mat> pyr;
pyr.resize(sigs.size());
pyr[0] = Mat::zeros(960, 1280, CV_32FC1);
for (size_t i = 1; i < sigs.size(); ++i)
GaussianBlur(pyr[i - 1], pyr[i], Size(), sigs[i], sigs[i]);
}
auto t2 = chrono::high_resolution_clock::now();
auto time = chrono::duration<double>(t2 - t1).count() * 1e3 / test_num;
cout << label << ": " << time << " ms\n";
};
blurTest(sig1, "1");
blurTest(sig3, "3");
blurTest(sig5, "5");
/* output:
1: 45.3958 ms
3: 28.5943 ms
5: 31.4827 ms
*/
The code above simulates buildGaussianPyramid()
when nOctaveLayers
is 1, 3, and 5. The sigma values are from cv::SIFT
calculation. This explains why SIFT costs much more time when nOctaveLayers
is 1.