NOTE:
I am new to MXNet.
It seems that the Gluon
module is meant to replace(?) the Symbol
module as the high level neural network (nn
) interface. So this question specifically seeks an answer utilizing the Gluon
module.
Residual neural networks (res-NNs) are fairly popular architecture (the link provides a review of res-NNs). In brief, res-NNs is an architecture where the input undergoes a (series of) transformation(s) (e.g. through a standard nn layer) and at the end is combined with its unadulterated self prior to an activation function:
So the main question here is "How to implement a res-NN structure with a custom gluon.Block
?" What follows is:
Normally sub-questions are seen as concurrent main questions resulting in the post being flagged as too general. In this case, they are legit sub questions, as my inability to solve my main questions stems from these sub-questions and the partial / first-draft documentation of the gluon module is insufficient to answer them.
"How to implement a res-NN structure with a custom gluon.Block
?"
First lets do some imports:
import mxnet as mx
import numpy as np
import math
import random
gpu_device=mx.gpu()
ctx = gpu_device
Prior to defining our res-NN structure, first we define a common convolution NN (cnn) architecture; namely, convolution → batch norm. → ramp.
class CNN1D(mx.gluon.Block):
def __init__(self, channels, kernel, stride=1, padding=0, **kwargs):
super(CNN1D, self).__init__(**kwargs)
with self.name_scope():
self.conv = mx.gluon.nn.Conv1D(channels=channels, kernel_size=kernel, strides=1, padding=padding)
self.bn = mx.gluon.nn.BatchNorm()
self.ramp = mx.gluon.nn.Activation(activation='relu')
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.ramp(x)
return x
Subquestion: mx.gluon.nn.Activation vs
NDArray
module's nd.relu? When to use which and why. In all MXNet tutorials / demos I saw in their documentation, customgluon.Block
s usend.relu(x)
in theforward
function.Subquestion:
self.ramp(self.conv(x))
vsmx.gluon.nn.Conv1D(activation='relu')(x)
? i.e. what is the consequence of adding the activation argument to a layer? Does that mean the activation is automatically applied in theforward
function when that layer is called?
Now that we have a re-usable cnn chuck, let's define a res-NN where:
chain_length
number of cnn chucksso here is my attempt:
class RES_CNN1D(mx.gluon.Block):
def __init__(self, channels, kernel, initial_stride, chain_length=1, stride=1, padding=0, **kwargs):
super(RES_CNN1D, self).__init__(**kwargs)
with self.name_scope():
num_rest = chain_length - 1
self.ramp = mx.gluon.nn.Activation(activation='relu')
self.init_cnn = CNN1D(channels, kernel, initial_stride, padding)
# I am guessing this is how to correctly add an arbitrary number of chucks
self.rest_cnn = mx.gluon.nn.Sequential()
for i in range(num_rest):
self.rest_cnn.add(CNN1D(channels, kernel, stride, padding))
def forward(self, x):
# make a copy of untouched input to send through chuncks
y = x.copy()
y = self.init_cnn(y)
# I am guess that if I call a mx.gluon.nn.Sequential object that all nets inside are called / the input gets passed along all of them?
y = self.rest_cnn(y)
y += x
y = self.ramp(y)
return y
Subquestion: adding a variable number of layers, should one use the hacky
eval("self.layer" + str(i) + " = mx.gluon.nn.Conv1D()")
or is this whatmx.gluon.nn.Sequential
is meant for?Subquestion: when defining the
forward
function in a customgluon.Block
which has an instance ofmx.gluon.nn.Sequential
(let us refer to it asself.seq
), doesself.seq(x)
just pass the argumentx
down the line? e.g. if this isself.seq
self.seq = mx.gluon.nn.Sequential()
self.conv1 = mx.gluon.nn.Conv1D()
self.conv2 = mx.gluon.nn.Conv1D()
self.seq.add(self.conv1)
self.seq.add(self.conv2)
is
self.seq(x)
equivalent toself.conv2(self.conv1(x))
?
Is this correct?
The desired result for
RES_CNN1D(10, 3, 2, chain_length=3)
should look like this
Conv1D(10, 3, stride=2) -----
BatchNorm |
Ramp |
Conv1D(10, 3) |
BatchNorm |
Ramp |
Conv1D(10, 3) |
BatchNorm |
Ramp |
| |
(+)<-------------------------
v
Ramp
self.ramp(self.conv(x)) vs mx.gluon.nn.Conv1D(activation='relu')(x) Yes. The latter applies a relu activation to the output of Conv1D.
mx.gluon.nn.Sequential is for grouping multiple layers into a block. Usually you don't need to explicitly define each layer as a class attribute. You can create a list to store all the layers you want to group and use a for loop to add all list elements into mx.gluon.nn.Sequential object.
Yes. Call forward on mx.gluon.nn.Sequential is equal to call forward on all child blocks, with topological order of computation graph.