I do not understand why code below has both activation layer and activation parameter? What is the point of having linear activation and then LeakyReLU? And finally, does the loss function act on the last layer or after the last layer is activated?
fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(28,28,1),padding='same'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D((2, 2),padding='same'))
...more code...
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
'Linear' activation means no activation (it is just identity mapping). Loss is applied to output of the model, and the output of the model in this case is the softmax.