I have been trying to implement l1-regularization in Tensorflow using the l1_regularization_strength parameter in the ProximalAdagradOptimizer function from Tensorflow. (I am using this optimizer specifically to get a sparse solution.) I have two questions regarding the regularization.
Regularization applies neither to forward or backpropagation but to the weight updates.
You can use different optimizers for different layers by explicitly passing the variables to minimize to each optimizer.