machine-learning neural-network deep-learning deep-residual-networks

Can Residual Nets skip one linearity instead of two?

Standard in ResNets is to skip 2 linearities. Would skipping only one work as well?

Solution

I would refer you to the original paper by Kaiming He at al.

In sections 3.1-3.2, they define "identity" shortcuts as y = F(x, W) + x, where W are the trainable parameters, for any residual mapping F to be learned. It is important that the residual mapping contains a non-linearity, otherwise the whole construction is one sophisticated linear layer. But the number of linearities is not limited.

For example, ResNeXt network creates identity shortcuts around a stack of only convolutional layers (see the figure below). So there aren't any dense layers in the residual block.

The general answer is, thus: yes, it would work. However, in a particular neural network, reducing two dense layers to one may be a bad idea, because anyway the residual block must be flexible enough to learn the residual function. So remember to validate any design you come up with.