Does it make sense to mix regularizers? For example using L1 to select features in the first layer and use L2 for the rest?
I created this model:
model = Sequential()
# the input layer uses L1 to partially serve as a feature selection layer
model.add(Dense(10, input_dim = train_x.shape[1], activation = 'swish', kernel_regularizer=regularizers.l1(0.001)))
model.add(Dense(20, activation = 'swish', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(20, activation = 'swish', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(10, activation = 'softmax'))
But I'm not sure if it is a good idea to mix L1&L2, to me it seems logical to have L1 as feature selector in the input layer. But everywhere, I'm just seeing code that uses the same regularizer for all layers.
(the model seems to give quite good results, >95% correct predictions in a multiclass classification problem)
Adding different regularizations in different layers is not a problem. There are papers regarding this topic Sparse input neural network. However, a few things need attention here.