At first, this question is less about programming itself but about some logic behind the CNN architecture. I do understand how every layer works but my only question is: Does is make sense to separate the ReLU and Convolution-Layer? I mean, can a ConvLayer exist and work and update its weights by using backpropagation without having a ReLU behind it?
I thought so. This is why I created the following independent layers:
I am thinking about merging Layer 1 and 2 into one. What should I go for?
Can it exist?
Yes. It can. There's nothing that stops neural networks from working without non-linearity modules in the model. The thing is, skipping the non-linearity module between two adjacent layers is equivalent to just a linear combination of inputs at layer 1 to get output at layer 2
M1 : Input =====> L1 ====> ReLU ====> L2 =====> Output
M2 : Input =====> L1 ====> ......... ====> L2 =====> Output
M3 : Input =====> L1 =====> Output
M2 & M3 are equivalent since the parameters adjust themselves over the training period to generate the same output. If there is any pooling involved in between, this may not be true but as long as the layers are consecutive, the network structure is just one large linear combination (Think PCA)
There is nothing that prevents the gradient updates & back-propagation throughout the network.
What should you do?
Keep some form of non-linearity between distinct layers. You may create convolution blocks which contain more than 1 convolution layer in it, but you should include a non-linear function at the end of these blocks and definitely after the dense layers. For the dense layer not-using an activation function is completely equivalent to using a single layer.
Have a look here Quora : Role of activation functions