Search code examples
deep-learningcaffeobject-detectionleveldblmdb

How can one create a dataset for object detection in caffe?


Creating a database (LMDB/LEVELDB) for images are trivial in caffe. But how do we create such dataset for object detection ?
Is this sequence the correct way to go ?

  1. put all images in a folder
  2. for each image, create a text file with the same name of the corresponding image*
  3. Put the bounding-box coordinates for each object in the image in a separate row

Now how do I convert such structure into an lmdb?
Should I convert all the txt files into bytes and save the whole byte-stream as one label for each image?
Will caffe be able to automatically read from such converted database or should I create a specific layer for reading and feeding the network the needed information?


Solution

  • You need to create a custom layer to handle the additional data that needs to be included in the lmdb file, you can have a look at an already implemented Fast-RCNN in caffe that does end-to-end detection on this page: https://github.com/rbgirshick/py-faster-rcnn/tree/master/models/coco/VGG_CNN_M_1024/faster_rcnn_end2end.

    By looking at the input layer on the prototxt file you can see they are using a custom type for the input:

    layer {
    name: 'input-data'
    type: 'Python'
    top: 'data'
    top: 'im_info'
    top: 'gt_boxes'
    python_param {
      module: 'roi_data_layer.layer'
      layer: 'RoIDataLayer'
      param_str: "'num_classes': 81"
     }
    

    }

    Also, you can see the details of this custom layer here: https://github.com/rbgirshick/fast-rcnn/tree/master/lib/roi_data_layer