I have a very simple question. When we performs gradient descent with regularization terms of the $L_1$ and/or $L_2$ types, namely expanding the loss function $L$ with
$$ L_r=L+l_1 \sum_i| \pi_i |+l_2 \sum_j ||\pi_j||^2 $$
Why we do not include $l_1$ and $l_2$ variables in the update rule of the gradient descent?
It's a hyperparameter, you cannot update weights and this parameter simultaneously. If you will optimize it with weights simultaneously, with respect to loss function on training and (or) testing set - yes, this parameter will become 0 and it will zero out penalty part. Because when you train complex model - it can easily overfit your dataset, and predict values perfectly, in this case best thing that optimization process can do to minimize loss, when model labels dataset perfectly - zero out this parameter. So parameter which was designed to prevent overfitting will do nothing useful.
But you can do grid search