c++image-processing computer-vision sift vlfeat

How to determine PHOW features for an image in C++ with vlfeat and opencv?

I have implemented a PHOW features detector in matlab, as follows:

    [frames, descrs] = vl_phow(im);

which is a wraper to the code:

    ...
    for i = 1:4
        ims = vl_imsmooth(im, scales(i) / 3) ;
        [frames{s}, descrs{s}] = vl_dsift(ims, 'Fast', 'Step', step, 'Size', scales(i)) ;
    end
    ...

I'm doing an implementation in c++ with opencv and vlfeat. This is part of my implementation code to calculate PHOW features for an image (Mat image):

   ...
   //convert into float array
   float* img_vec = im2single(image);

   //create filter
   VlDsiftFilter* vlf = vl_dsift_new(image.cols, image.rows);

   double bin_sizes[] = { 3, 4, 5, 6 };
   double magnif = 3;
   double* scales = (double*)malloc(4*sizeof(double));
   for (size_t i = 0; i < 4; i++)
   {
       scales[i] = bin_sizes[i] / magnif;
   }
   for (size_t i = 0; i < 4; i++)
   {
       double sigma = sqrt(pow(scales[i], 2) - 0.25);

       //smooth float array image 
       float* img_vec_smooth = (float*)malloc(image.rows*image.cols*sizeof(float));
       vl_imsmooth_f(img_vec_smooth, image.cols, img_vec, image.cols, image.rows, image.cols, sigma, sigma);

       //run DSIFT
       vl_dsift_process(vlf, img_vec_smooth);

       //number of keypoints found
       int keypoints_num = vl_dsift_get_keypoint_num(vlf);

       //extract keypoints
       const VlDsiftKeypoint* vlkeypoints = vl_dsift_get_keypoints(vlf);

       //descriptors dimention
       int dim = vl_dsift_get_descriptor_size(vlf);

       //extract descriptors
       const float* descriptors = vl_dsift_get_descriptors(vlf);
   ...

   //return all descriptors of diferent scales

I'm not sure if the return should be the set of all descriptors for all scales, which requires a lot of storage space when we are processing several images; or the result of an operation between descriptors of different scales. Can you help me with this doubt? Thanks

Solution

You can do either. The simplest would be to simply concatenate the different levels. I believe this is what VLFeat does (atleast they don't say they do anything more in the documentation). Removing those below your contrast threshold should help, but you'll still have several thousand (depending on the size of your image). But you could compare the descriptors occurring near the same location to prune some out. Its a bit of a time-space trade-off. Generally, I've seen the bin sizes spaced (by intervals of 2, but could be more) which should reduce the need to check for overlapping descriptors.