Search code examples
neural-networkcaffetorchconv-neural-network

Using Neural Networks for Data Manipulation


Perhaps related to this question, but my goal is to have a network perform a manipulation on an input image and output the resulting image data.

In the event this question lacks clarity, I would be glad to delve into deeper detail of my problem in the comments. However, I'll try to be as non-case-specific as possible, to make this question of use to others.

The Problem

I have a plethora of training data consisting of images before and after the proposed manipulation. My question relates to how I can train 1-to-1 each pixel using Caffe. My loss should take the form of something that computes the difference between the two images.

If I have my last fully-connected/inner-product layer outputting channels * height * width and I have my label of the expected output image (same dimensions) what type of loss+accuracy structure should I use?

My Case

I've tried simply passing the inner-product data to a sigmoid cross-entropy loss with my label data, but it doesn't seem to be a supported method.

My labels are non integer values, as they are pixel RGB data between 0 and 1 (Note: I could use integers in the form of 0 to 255) and Caffe seems to interpret labels as categories as opposed to simple values.

I could have 255 categories per pixel channel, but that would result in 255 * 3 channels * 256 height * 256 width = 50,135,040 categories, which is absurdly over complicating what I'm trying to achieve.

My Questions

  • Does Caffe natively support what I'm trying to achieve?
    • If so, how should I change my structure to conform to these specification?
    • If not, do any other neural network frameworks such as Torch support this?
  • Is there a name for the type of problem I'm trying to solve with my network (certainly not categorical classification)?
    • What has been used in the past to solve this style of problem?

Sources With Potential Value


Solution

  • The loss layer you are looking for is Euclidean loss layer (Mean Squared Error):

    layers {
      name: "loss"
      type: EUCLIDEAN_LOSS
      top: "loss"
      bottom: "CONVX_15"
      bottom: "labels"
    }
    

    Your problem is multivariate regression and you have to use loss which is suitable for it. Sigmoid cross-entropy loss is for classification, where the target values (labels) have to be between 0 and 1 (e.g. probability that a pixel is on/off).

    With euclidean loss, be carefull to manage your gradients. Keep your target values in range <0,1> and use Xavier weight initialization. Still, you will probably need to set lower learning rate compared to classification problems to keep the SGD from exploding.