I am going to build a neural network which has an architecture of more than one output layers. More specificly, it is designed to construct parallel procedures on top of a series of convolutional layers. One branch is to compute classification results (softmax-like); the other is to get regression results. However, I'm stuck designing the model as well as choosing loss functions(criterions).
I. Should I use torch container nn.Parallel()
or nn.Concat()
for the branch layers on top of conv layers (nn.Sequential()
)? What is the differenct except for data format.
II. Due to output data, a classification loss function and a regression loss function are to be combined linearly. I am wondering whether nn.MultiCriterion()
or nn.ParallelCriterion()
to be chosen with respect to determined container. Or I have to customize a new criterion class.
III. Could anyone who had done similar work tell me if torch needs additional customization to implement backprop for training. I concern about data structure issue of torch containers.
Concat
vs Parallel
differ in that each module in Concat
gets the entire output of the last layer as input, while each input of Parallel
takes a slice of the output of the last layer. For your purpose you need Concat
, not Parallel
, since both loss functions need to take the entire output of your sequential network.
Based on the source code of MultiCriterion
and ParallenCriterion
they do practically the same thing. The important difference is that in case of MultiCriterion
you provide multiple loss functions, but only one target, and they are all computed against that target. Given that you have a classification and a regression task I assume you have different targets, so you need ParallelCriterion(false)
, where false
enables the multitarget mode (if the argument is true
ParallelCriterion
seems to behave identical to MultiCriterion
). Then the target
is expected to be a table of targets for individual criterions.
If you use Concat
and ParallelCriterion
, torch should be able to compute gradients properly for you. The both implement updateGradInput
, which properly merges the gradients of individual blocks.