In my project I am calculating HOG features on GPU for different levels in the same image. My aim is to detect the following objects.
1. Truck
2. Car
3. Person
Most important question is the selection of window size in case of multi class object detector. This post provide a very good base but it does not provide an answer for the selection of window size in case of multi class feature.
To solve this problem I calculated the HOG features of each positive image at different levels/resolution keeping the window size(48*96) same but the file for each image is around 600 MB which is too large.
Please let me know how to select the window size, block size and cell size in case of multi class object detection. Here is my code that I used to calculate the HOG features.
void App::run()
{
unsigned int count = 1;
FileStorage fs;
running = true;
//int width;
//int height;
Size win_size(args.win_width, args.win_width * 2);
Size win_stride(args.win_stride_width, args.win_stride_height);
cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);
VideoCapture vc("/home/ubuntu/Desktop/getdescriptor/images/image%d.jpg");
Mat frame;
Mat Left;
Mat img_aux, img, img_to_show, img_new;
cv::Mat temp;
gpu::GpuMat gpu_img, descriptors, new_img;
char cbuff[20];
while (running)
{
vc.read(frame);
if (!frame.empty())
{
workBegin();
width = frame.rows;
height = frame.cols;
sprintf (cbuff, "%04d", count);
// Change format of the image
if (make_gray) cvtColor(frame, img_aux, CV_BGR2GRAY);
else if (use_gpu) cvtColor(frame, img_aux, CV_BGR2BGRA);
else Left.copyTo(img_aux);
// Resize image
if (args.resize_src) resize(img_aux, img, Size(args.width, args.height));
else img = img_aux;
img_to_show = img;
gpu_hog.nlevels = nlevels;
hogWorkBegin();
if (use_gpu)
{
gpu_img.upload(img);
new_img.upload(img_new);
fs.open(cbuff, FileStorage::WRITE);
for(int levels = 0; levels < nlevels; levels++)
{
gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
descriptors.download(temp);
//printf("size %d %d\n", temp.rows, temp.cols);
fs <<"level" << levels;
fs << "features" << temp;
cout<<"("<<width<<","<<height<<")"<<endl;
width = round(width/scale);
height = round(height/scale);
if( width < win_size.width || height < win_size.height )
break;
cout<<"Levels "<<levels<<endl;
resize(img,img_new,Size(width,height));
scale *= scale;
}
cout<<count<< " Image feature calculated !"<<endl;
count++;
//width = 640; height = 480;
scale = 1.05;
}
hogWorkEnd();
fs.release();
}
else running = false;
}
}
The window size should be chosen, s.t. the object(s) you want to detect fit into the window. If you want to have different window sizes for different types this might become tricky.
Usually what you do is the following
If you want to use differently sized detection windows you will get feature vectors of different size (by nature of the HoG features). The tricky thing is, that in the testing phase you have to use as many sliding windows as object types you use. This would definitely work, but you have to process each testing image several times leading to higher processing time)
To answer your question of the sizes: There is no value I can give you, it always depends on your images. Using an image pyramid as you mentioned above is a good way to deal with differently scaled objects.
Sample code for visualization of HoG features can be found here. This also helps understand how the feature vectors look like.
EDIT: Found out the hard way, that only cv::Size(8,8)
is allowed for cell size. See documentation.