c++opencv computer-vision gpu pattern-recognition

HOG features for object detection using GPU in OpenCV

In my project I am calculating HOG features on GPU for different levels in the same image. My aim is to detect the following objects.
1. Truck
2. Car
3. Person
Most important question is the selection of window size in case of multi class object detector. This post provide a very good base but it does not provide an answer for the selection of window size in case of multi class feature.
To solve this problem I calculated the HOG features of each positive image at different levels/resolution keeping the window size(48*96) same but the file for each image is around 600 MB which is too large.
Please let me know how to select the window size, block size and cell size in case of multi class object detection. Here is my code that I used to calculate the HOG features.

void App::run()
{
    unsigned int count = 1;
    FileStorage fs;
    running = true;

    //int width;
    //int height;

    Size win_size(args.win_width, args.win_width * 2); 
    Size win_stride(args.win_stride_width, args.win_stride_height);

    cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
                                   cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
                                   cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);

    VideoCapture vc("/home/ubuntu/Desktop/getdescriptor/images/image%d.jpg");
    Mat frame;
    Mat Left;
    Mat img_aux, img, img_to_show, img_new;
    cv::Mat temp;
    gpu::GpuMat gpu_img, descriptors, new_img;

    char cbuff[20];



    while (running)
    {

        vc.read(frame);


        if (!frame.empty())
        {
            workBegin();

            width  = frame.rows;
            height = frame.cols;

            sprintf (cbuff, "%04d", count);

            // Change format of the image
            if (make_gray) cvtColor(frame, img_aux, CV_BGR2GRAY);
            else if (use_gpu) cvtColor(frame, img_aux, CV_BGR2BGRA);
            else Left.copyTo(img_aux);

            // Resize image
            if (args.resize_src) resize(img_aux, img, Size(args.width, args.height));
            else img = img_aux;
            img_to_show = img;

            gpu_hog.nlevels = nlevels;

            hogWorkBegin();
            if (use_gpu)
            {
                gpu_img.upload(img);
                new_img.upload(img_new);

                fs.open(cbuff, FileStorage::WRITE);


                for(int levels = 0; levels < nlevels; levels++)
                {
                gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
                descriptors.download(temp);

                //printf("size %d %d\n", temp.rows, temp.cols);

                fs <<"level" << levels;                
                fs << "features" << temp;

                cout<<"("<<width<<","<<height<<")"<<endl;

                width =  round(width/scale);
                height = round(height/scale);

                if( width < win_size.width || height < win_size.height )
                break;

                cout<<"Levels "<<levels<<endl;

                resize(img,img_new,Size(width,height));
                scale *= scale;
                }

                cout<<count<< " Image feature calculated !"<<endl;
                count++;
                //width = 640; height = 480;
                scale = 1.05;

            }

            hogWorkEnd();
            fs.release();
          }
           else  running = false;
       }
}

Solution

The window size should be chosen, s.t. the object(s) you want to detect fit into the window. If you want to have different window sizes for different types this might become tricky.

Usually what you do is the following

Take training data for each type of objects, and train [number of object types] many models using the features extracted at the known position of the objects.
Then you take each test image and use a sliding window approach to extract features at each location. These features are then compared to each model. If one of the models lead to a score higher than a certain threshold you have found this object. If more than one model scores higher than the threshold simply take the one scoring highest.

If you want to use differently sized detection windows you will get feature vectors of different size (by nature of the HoG features). The tricky thing is, that in the testing phase you have to use as many sliding windows as object types you use. This would definitely work, but you have to process each testing image several times leading to higher processing time)

To answer your question of the sizes: There is no value I can give you, it always depends on your images. Using an image pyramid as you mentioned above is a good way to deal with differently scaled objects.

window size: the whole object should fit in; has to be divisible by block size
block size has to be divisible by cell size

Sample code for visualization of HoG features can be found here. This also helps understand how the feature vectors look like.

EDIT: Found out the hard way, that only cv::Size(8,8) is allowed for cell size. See documentation.