caffe python: implementing a residual layer

I am trying to implement a residual layer for CNN (using caffe and python). This is a simple block diagram for residual learning: enter image description here

This is the code I've written:

def res(self,bottom,args):
    'residual layer'

    rp = {'negative_slope': 0}

    if len(args)!=6:
        raise Exception('conv requires 6 arguments: ks, stride, pad, group, nout, bias')
    ks, stride, pad, group, nout, bias = [int(x) for x in args]
    wf = {}
    bias = bool(bias)
    cp = { 'kernel_size'     : [1, ks],
           'stride'          : [1, stride],
           'pad'             : [0, pad],
           'group'           : group,
           'num_output'      : nout,
           'bias_term'       : bias,
           'axis'            : 1,
           'weight_filler'   : { 'type': 'xavier' },
           'bias_filler'     : { 'type': 'constant', 'value':0.0 },
           }

    # multipliers for learning rate and decay of weights and bias
    p  = [{'lr_mult':1, 'decay_mult':1}]
    if bias:
        p.append({'lr_mult':2, 'decay_mult':0})

    myconv1 = L.Convolution(bottom, param=p, convolution_param=cp)

    rconv1 = L.ReLU(myconv1, relu_param=rp, in_place=True)

    cp2 = { 'kernel_size'     : [1, ks],
           'stride'          : [1, stride],
           'pad'             : [0, pad+2],
           'group'           : group,
           'num_output'      : nout,
           'bias_term'       : bias,
           'axis'            : 1,
           'weight_filler'   : { 'type': 'xavier' },
           'bias_filler'     : { 'type': 'constant', 'value':0.0 },
          }

    myconv2 = L.Convolution(rconv1, param=p, convolution_param=cp2)

    forSum = []
    forSum.append(bottom)
    forSum.append(myconv2)

    ep = { 'operation' : 1 }
    return L.Eltwise(*forSum, eltwise_param=ep)

And this is the error I get for this architecture c:3:1:0:1:16:0 r mp:2:2 res:3:1:0:1:16:0 r mp:2:2 fc:20:0:

    python /afs/in2p3.fr/home/n/nhatami/sps/spectroML/src/python/makeSpectroNet.py -label label -n CNN_062 -bs 10 res/2048_1e5_0.00_s/CNN_062_bs10/CNN_062_tmp/CNN_062 data/2048_1e5_0.00/2048_1e5_0.00_s c:3:1:0:1:16:0 cr mp:2:2 res:3:1:0:1:16:0 cr mp:2:2 fc:20:0
    Namespace(batchSize=10, droot='data/2048_1e5_0.00/2048_1e5_0.00_s', label='label', layers=['c:3:1:0:1:16:0', 'cr', 'mp:2:2', 'res:3:1:0:1:16:0', 'cr', 'mp:2:2', 'fc:20:0'], name='CNN_062', oroot='res/2048_1e5_0.00_s/CNN_062_bs10/CNN_062_tmp/CNN_062')
    data/2048_1e5_0.00/2048_1e5_0.00_s data/2048_1e5_0.00/2048_1e5_0.00_s_train_list.txt data/2048_1e5_0.00/2048_1e5_0.00_s_val_list.txt
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    I0208 18:00:05.952062 194649 upgrade_proto.cpp:67] Attempting to upgrade input file specified using deprecated input fields: res/2048_1e5_0.00_s/CNN_062_bs10/CNN_062_tmp/CNN_062_deploy.txt
    I0208 18:00:05.952121 194649 upgrade_proto.cpp:70] Successfully upgraded file specified using deprecated input fields.
    W0208 18:00:05.952126 194649 upgrade_proto.cpp:72] Note that future Caffe releases will only support input layers and not input fields.
    I0208 18:00:06.349092 194649 net.cpp:51] Initializing net from parameters:
    name: "CNN_062"
    state {
      phase: TEST
      level: 0
    }
    layer {
      name: "input"
      type: "Input"
      top: "data"
      input_param {
        shape {
          dim: 1
          dim: 2
          dim: 1
          dim: 2048
        }
      }
    }
    layer {
      name: "conv1"
      type: "Convolution"
      bottom: "data"
      top: "conv1"
      param {
        lr_mult: 1
        decay_mult: 1
      }
      convolution_param {
        num_output: 16
        bias_term: false
        pad: 0
        pad: 0
        kernel_size: 1
        kernel_size: 3
        group: 1
              stride: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
    axis: 1
  }
}
layer {
  name: "Scale1"
  type: "Scale"
  bottom: "conv1"
  top: "Scale1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    filler {
      type: "constant"
      value: -1
    }
  }
}
layer {
  name: "ReLU1"
  type: "ReLU"
  bottom: "Scale1"
  top: "ReLU1"
  relu_param {
    negative_slope: 0
  }
}
layer {
  name: "Scale2"
  type: "Scale"
  bottom: "ReLU1"
  top: "Scale2"
  param {
    lr_mult: 0
    decay_mult: 0
          }
  scale_param {
    filler {
      type: "constant"
      value: -1
    }
  }
}
layer {
  name: "ReLU2"
  type: "ReLU"
  bottom: "conv1"
  top: "ReLU2"
  relu_param {
    negative_slope: 0
  }
}
layer {
  name: "crelu1"
  type: "Concat"
  bottom: "Scale2"
  bottom: "ReLU2"
  top: "crelu1"
}
layer {
  name: "maxPool1"
  type: "Pooling"
  bottom: "crelu1"
  top: "maxPool1"
  pooling_param {
    pool: MAX
    kernel_h: 1
    kernel_w: 2
    stride_h: 1
    stride_w: 2
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "Convolution1"
  type: "Convolution"
  bottom: "maxPool1"
  top: "Convolution1"
  param {
            lr_mult: 1
    decay_mult: 1
  }
  convolution_param {
    num_output: 16
    bias_term: false
    pad: 0
    pad: 0
    kernel_size: 1
    kernel_size: 3
    group: 1
    stride: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
    axis: 1
  }
}
layer {
  name: "ReLU3"
  type: "ReLU"
  bottom: "Convolution1"
  top: "Convolution1"
  relu_param {
    negative_slope: 0
  }
}
layer {
  name: "Convolution2"
  type: "Convolution"
  bottom: "Convolution1"
  top: "Convolution2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  convolution_param {
    num_output: 16
    bias_term: false
    pad: 0
        pad: 2
    kernel_size: 1
    kernel_size: 3
    group: 1
    stride: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
    axis: 1
  }
}
layer {
  name: "res1"
  type: "Eltwise"
  bottom: "maxPool1"
  bottom: "Convolution2"
  top: "res1"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "Scale3"
  type: "Scale"
  bottom: "res1"
  top: "Scale3"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    filler {
      type: "constant"
      value: -1
    }
  }
}
layer {
  name: "ReLU4"
  type: "ReLU"
         bottom: "Scale3"
  top: "ReLU4"
  relu_param {
    negative_slope: 0
  }
}
layer {
  name: "Scale4"
  type: "Scale"
  bottom: "ReLU4"
  top: "Scale4"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    filler {
      type: "constant"
      value: -1
    }
  }
}
layer {
  name: "ReLU5"
  type: "ReLU"
  bottom: "res1"
  top: "ReLU5"
  relu_param {
    negative_slope: 0
  }
}
layer {
  name: "crelu2"
  type: "Concat"
  bottom: "Scale4"
  bottom: "ReLU5"
  top: "crelu2"
}
layer {
  name: "maxPool2"
  type: "Pooling"
        bottom: "crelu2"
  top: "maxPool2"
  pooling_param {
    pool: MAX
    kernel_h: 1
    kernel_w: 2
    stride_h: 1
    stride_w: 2
    pad_h: 0
    pad_w: 0
  }
}
layer {
  name: "ampl"
  type: "InnerProduct"
  bottom: "maxPool2"
  top: "ampl"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  inner_product_param {
    num_output: 20
    bias_term: false
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
I0208 18:00:06.349267 194649 layer_factory.hpp:77] Creating layer input
I0208 18:00:06.349287 194649 net.cpp:84] Creating Layer input
I0208 18:00:06.349298 194649 net.cpp:380] input -> data
I0208 18:00:06.349334 194649 net.cpp:122] Setting up input
I0208 18:00:06.349346 194649 net.cpp:129] Top shape: 1 2 1 2048 (4096)
I0208 18:00:06.349351 194649 net.cpp:137] Memory required for data: 16384
I0208 18:00:06.349356 194649 layer_factory.hpp:77] Creating layer conv1
I0208 18:00:06.349371 194649 net.cpp:84] Creating Layer conv1
I0208 18:00:06.349376 194649 net.cpp:406] conv1 <- data
I0208 18:00:I0208 18:00:06.349556 194649 net.cpp:380] conv1_conv1_0_split -> conv1_conv1_0_split_1
I0208 18:00:06.349568 194649 net.cpp:122] Setting up conv1_conv1_0_split
I0208 18:00:06.349575 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349580 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349583 194649 net.cpp:137] Memory required for data: 409216
I0208 18:00:06.349587 194649 layer_factory.hpp:77] Creating layer Scale1
I0208 18:00:06.349598 194649 net.cpp:84] Creating Layer Scale1
I0208 18:00:06.349603 194649 net.cpp:406] Scale1 <- conv1_conv1_0_split_0
I0208 18:00:06.349611 194649 net.cpp:380] Scale1 -> Scale1
I0208 18:00:06.349642 194649 net.cpp:122] Setting up Scale1
I0208 18:00:06.349647 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349651 194649 net.cpp:137] Memory required for data: 540160
I0208 18:00:06.349659 194649 layer_factory.hpp:77] Creating layer ReLU1
I0208 18:00:06.349668 194649 net.cpp:84] Creating Layer ReLU1
I0208 18:00:06.349673 194649 net.cpp:406] ReLU1 <- Scale1
I0208 18:00:06.349679 194649 net.cpp:380] ReLU1 -> ReLU1
I0208 18:00:06.349689 194649 net.cpp:122] Setting up ReLU1
I0208 18:00:06.349694 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349699 194649 net.cpp:137] Memory required for data: 671104
I0208 18:00:06.349702 194649 layer_factory.hpp:77] Creating layer Scale2
I0208 18:00:06.349709 194649 net.cpp:84] Creating Layer Scale2
I0208 18:00:06.349714 194649 net.cpp:406] Scale2 <- ReLU1
I0208 18:00:06.349720 194649 net.cpp:380] Scale2 -> Scale2
I0208 18:00:06.349741 194649 net.cpp:122] Setting up Scale2
I0208 18:00:06.349747 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349751 194649 net.cpp:137] Memory required for data: 802048
I0208 18:00:06.349758 194649 layer_factory.hpp:77] Creating layer ReLU2
I0208 18:00:06.349771 194649 net.cpp:84] Creating Layer ReLU2
I0208 18:00:06.349776 194649 net.cpp:406] ReLU2 <- conv1_conv1_0_split_1
I0208 18:00:06.349782 194649 net.cpp:380] ReLU2 -> ReLU2
I0208 18:00:06.349789 194649 net.cpp:122] Setting up ReLU2
I0208 18:00:06.349795 194649 net.cpp:129] Top shape: 1 16 1 2046 (32736)
I0208 18:00:06.349799 194649 net.cpp:137] Memory required for data: 932992
I0208 18:00:06.349803 194649 layer_factory.hpp:77] Creating layer crelu1
I0208 18:00:06.349812 194649 net.cpp:84] Creating Layer crelu1
I0208 18:00:06.349815 194649 net.cpp:406] crelu1 <- Scale2
I0208 18:00:06.349822 194649 net.cpp:406] crelu1 <- ReLU2
I0208 18:00:06.349829 194649 net.cpp:380] crelu1 -> crelu1
I0208 18:00:06.349843 194649 net.cpp:122] Setting up crelu1
I0208 18:00:06.349848 194649 net.cpp:129] Top shape: 1 32 1 2046 (65472)
I0208 18:00:06.349853 194649 net.cpp:137] Memory required for data: 1194880
I0208 18:00:06.349856 194649 layer_factory.hpp:77] Creating layer maxPool1
I0208 18:00:06.349864 194649 net.cpp:84] Creating Layer maxPool1
                 I0208 18:00:06.349870 194649 net.cpp:406] maxPool1 <- crelu1
I0208 18:00:06.349876 194649 net.cpp:380] maxPool1 -> maxPool1
I0208 18:00:06.349891 194649 net.cpp:122] Setting up maxPool1
I0208 18:00:06.349897 194649 net.cpp:129] Top shape: 1 32 1 1023 (32736)
I0208 18:00:06.349901 194649 net.cpp:137] Memory required for data: 1325824
I0208 18:00:06.349905 194649 layer_factory.hpp:77] Creating layer maxPool1_maxPool1_0_split
I0208 18:00:06.349911 194649 net.cpp:84] Creating Layer maxPool1_maxPool1_0_split
I0208 18:00:06.349915 194649 net.cpp:406] maxPool1_maxPool1_0_split <- maxPool1
I0208 18:00:06.349925 194649 net.cpp:380] maxPool1_maxPool1_0_split -> maxPool1_maxPool1_0_split_0
I0208 18:00:06.349931 194649 net.cpp:380] maxPool1_maxPool1_0_split -> maxPool1_maxPool1_0_split_1
I0208 18:00:06.349937 194649 net.cpp:122] Setting up maxPool1_maxPool1_0_split
I0208 18:00:06.349943 194649 net.cpp:129] Top shape: 1 32 1 1023 (32736)
I0208 18:00:06.349948 194649 net.cpp:129] Top shape: 1 32 1 1023 (32736)
I0208 18:00:06.349952 194649 net.cpp:137] Memory required for data: 1587712
I0208 18:00:06.349962 194649 layer_factory.hpp:77] Creating layer Convolution1
I0208 18:00:06.349973 194649 net.cpp:84] Creating Layer Convolution1
I0208 18:00:06.349983 194649 net.cpp:406] Convolution1 <- maxPool1_maxPool1_0_split_0
I0208 18:00:06.349999 194649 net.cpp:380] Convolution1 -> Convolution1
I0208 18:00:06.350034 194649 net.cpp:122] Setting up Convolution1
I0208 18:00:06.350040 194649 net.cpp:129] Top shape: 1 16 1 1021 (16336)
I0208 18:00:06.350044 194649 net.cpp:137] Memory required for data: 1653056
I0208 18:00:06.350050 194649 layer_factory.hpp:77] Creating layer ReLU3
I0208 18:00:06.350056 194649 net.cpp:84] Creating Layer ReLU3
I0208 18:00:06.350061 194649 net.cpp:406] ReLU3 <- Convolution1
I0208 18:00:06.350067 194649 net.cpp:367] ReLU3 -> Convolution1 (in-place)
I0208 18:00:06.350075 194649 net.cpp:122] Setting up ReLU3
I0208 18:00:06.350080 194649 net.cpp:129] Top shape: 1 16 1 1021 (16336)
I0208 18:00:06.350083 194649 net.cpp:137] Memory required for data: 1718400
I0208 18:00:06.350087 194649 layer_factory.hpp:77] Creating layer Convolution2
I0208 18:00:06.350095 194649 net.cpp:84] Creating Layer Convolution2
I0208 18:00:06.350100 194649 net.cpp:406] Convolution2 <- Convolution1
I0208 18:00:06.350108 194649 net.cpp:380] Convolution2 -> Convolution2
I0208 18:00:06.350132 194649 net.cpp:122] Setting up Convolution2
I0208 18:00:06.350138 194649 net.cpp:129] Top shape: 1 16 1 1023 (16368)
I0208 18:00:06.350142 194649 net.cpp:137] Memory required for data: 1783872
I0208 18:00:06.350149 194649 layer_factory.hpp:77] Creating layer res1
I0208 18:00:06.350158 194649 net.cpp:84] Creating Layer res1
I0208 18:00:06.350163 194649 net.cpp:406] res1 <- maxPool1_maxPool1_0_split_1
I0208 18:00:06.350168 194649 net.cpp:406] res1 <- Convolution2
I0208 18:00:06.350178 194649 net.cpp:380] res1 -> res1
F0208 18:00:06.350195 194649 eltwise_layer.cpp:34] Check failed: bottom[0]->shape() == bottom[i]->shape() bottom[0]: 1 32 1 1023 (32736), bottom[1]: 1 16 1 1023 (16368)
*** Check failure stack trace: ***

                                                                                                                                                       336,1         63%
                                                                                                                                                             71,1           5%

I would really appreciate your help!

Solution

The tricky thing about residual blocks is that x and F(x) must have the same shape, otherwise you cannot sum them up: x + F(x).
In your example it seems like x has dimension 32 while F(x) has dimension 16.
It is common practice to place a 1x1 conv layer on the residual link in cases where the dimensions of F(x) are different from the dimensions of x:
- when stride!=1 (spatial dimension different)
- when changing the number of channels (usually in a new "block" in resnet)