I would like to classify pixels of an image to "is street" or "is not street". I have some training data from the KITTI dataset and I have seen that Caffe has an IMAGE_DATA
layer type.
The labels are there in form of images of the same size as the input image.
Besides Caffe, my first idea to solve this problem was by giving image patches around the pixel which should get classified (e.g. 20 pixels to the top / left / right / bottom, resulting in 41×41=1681 features per pixel I want to classify.
However, if I could tell caffe how to use the labels without having to create those image patches manually (and the layer type IMAGE_DATA
seems to suggest that it is possible) I would prefer that.
Can Caffe classify pixels of an image directly? How would such a prototxt network definition look like? How do I give Caffe the information about the labels?
I guess the input layer would be something like
layers {
name: "data"
type: IMAGE_DATA
top: "data"
top: "label"
image_data_param {
source: "path/to/file_list.txt"
mean_file: "path/to/imagenet_mean.binaryproto"
batch_size: 4
crop_size: 41
mirror: false
new_height: 256
new_width: 256
}
}
However, I am not sure what crop_size
exactly means. Is it really centered? How does caffe deal with the corner pixels? What is new_height
and new_width
good for?
Seems you can try fully convolutional networks for semantic segmentation
Caffe was cited in this paper: https://github.com/BVLC/caffe/wiki/Publications
Also here is the model: https://github.com/BVLC/caffe/wiki/Model-Zoo#fully-convolutional-semantic-segmentation-models-fcn-xs
Also this presentation can be helpfull: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-pixels.pdf