neural-network caffe torch conv-neural-network

Using Neural Networks for Data Manipulation

Perhaps related to this question, but my goal is to have a network perform a manipulation on an input image and output the resulting image data.

In the event this question lacks clarity, I would be glad to delve into deeper detail of my problem in the comments. However, I'll try to be as non-case-specific as possible, to make this question of use to others.

The Problem

I have a plethora of training data consisting of images before and after the proposed manipulation. My question relates to how I can train 1-to-1 each pixel using Caffe. My loss should take the form of something that computes the difference between the two images.

If I have my last fully-connected/inner-product layer outputting channels * height * width and I have my label of the expected output image (same dimensions) what type of loss+accuracy structure should I use?

My Case

I've tried simply passing the inner-product data to a sigmoid cross-entropy loss with my label data, but it doesn't seem to be a supported method.

My labels are non integer values, as they are pixel RGB data between 0 and 1 (Note: I could use integers in the form of 0 to 255) and Caffe seems to interpret labels as categories as opposed to simple values.

I could have 255 categories per pixel channel, but that would result in 255 * 3 channels * 256 height * 256 width = 50,135,040 categories, which is absurdly over complicating what I'm trying to achieve.

My Questions

Does Caffe natively support what I'm trying to achieve?
- If so, how should I change my structure to conform to these specification?
- If not, do any other neural network frameworks such as Torch support this?
Is there a name for the type of problem I'm trying to solve with my network (certainly not categorical classification)?
- What has been used in the past to solve this style of problem?

Sources With Potential Value

Training Super-Resolution Image Upscaling
Karol Gregor on Variational Autoencoders and Image Generation (less relevant as this touches on model reconstruction)

Solution

The loss layer you are looking for is Euclidean loss layer (Mean Squared Error):

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  top: "loss"
  bottom: "CONVX_15"
  bottom: "labels"
}

Your problem is multivariate regression and you have to use loss which is suitable for it. Sigmoid cross-entropy loss is for classification, where the target values (labels) have to be between 0 and 1 (e.g. probability that a pixel is on/off).

With euclidean loss, be carefull to manage your gradients. Keep your target values in range <0,1> and use Xavier weight initialization. Still, you will probably need to set lower learning rate compared to classification problems to keep the SGD from exploding.