neural-network deep-learning convolution torch

Torch implementation of multi-output-layer neural network

I am going to build a neural network which has an architecture of more than one output layers. More specificly, it is designed to construct parallel procedures on top of a series of convolutional layers. One branch is to compute classification results (softmax-like); the other is to get regression results. However, I'm stuck designing the model as well as choosing loss functions(criterions).

I. Should I use torch container nn.Parallel() or nn.Concat() for the branch layers on top of conv layers (nn.Sequential())? What is the differenct except for data format.

II. Due to output data, a classification loss function and a regression loss function are to be combined linearly. I am wondering whether nn.MultiCriterion() or nn.ParallelCriterion() to be chosen with respect to determined container. Or I have to customize a new criterion class.

III. Could anyone who had done similar work tell me if torch needs additional customization to implement backprop for training. I concern about data structure issue of torch containers.

Solution

Concat vs Parallel differ in that each module in Concat gets the entire output of the last layer as input, while each input of Parallel takes a slice of the output of the last layer. For your purpose you need Concat, not Parallel, since both loss functions need to take the entire output of your sequential network.
Based on the source code of MultiCriterion and ParallenCriterion they do practically the same thing. The important difference is that in case of MultiCriterion you provide multiple loss functions, but only one target, and they are all computed against that target. Given that you have a classification and a regression task I assume you have different targets, so you need ParallelCriterion(false), where false enables the multitarget mode (if the argument is true ParallelCriterion seems to behave identical to MultiCriterion). Then the target is expected to be a table of targets for individual criterions.
If you use Concat and ParallelCriterion, torch should be able to compute gradients properly for you. The both implement updateGradInput, which properly merges the gradients of individual blocks.