Search code examples
computer-visionconv-neural-networkcaffeimage-segmentationsemantic-segmentation

Can Caffe classify pixels of an image directly?


I would like to classify pixels of an image to "is street" or "is not street". I have some training data from the KITTI dataset and I have seen that Caffe has an IMAGE_DATA layer type. The labels are there in form of images of the same size as the input image.

Besides Caffe, my first idea to solve this problem was by giving image patches around the pixel which should get classified (e.g. 20 pixels to the top / left / right / bottom, resulting in 41×41=1681 features per pixel I want to classify.
However, if I could tell caffe how to use the labels without having to create those image patches manually (and the layer type IMAGE_DATA seems to suggest that it is possible) I would prefer that.

Can Caffe classify pixels of an image directly? How would such a prototxt network definition look like? How do I give Caffe the information about the labels?

I guess the input layer would be something like

layers {
  name: "data"
  type: IMAGE_DATA
  top: "data"
  top: "label"
  image_data_param {
    source: "path/to/file_list.txt"
    mean_file: "path/to/imagenet_mean.binaryproto"
    batch_size: 4
    crop_size: 41
    mirror: false
    new_height: 256
    new_width: 256
  }
}

However, I am not sure what crop_size exactly means. Is it really centered? How does caffe deal with the corner pixels? What is new_height and new_width good for?


Solution

  • Seems you can try fully convolutional networks for semantic segmentation

    Caffe was cited in this paper: https://github.com/BVLC/caffe/wiki/Publications

    Also here is the model: https://github.com/BVLC/caffe/wiki/Model-Zoo#fully-convolutional-semantic-segmentation-models-fcn-xs

    Also this presentation can be helpfull: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-pixels.pdf