machine-learning statistics gradient-descent

L1-norm vs l2-norm as cost function when standardizing

I have some data where both the input and the output values are standardized, so the difference between Y and Y_pred is always gonna very small.

I feel that the l2-norm will penalize less the model than the l1-norm since squaring a number that is between 0 and 1 will always result in a lower number.

So my question is, is it ok to use the l2-norm when both the input and the output are standardized?

Solution

It does not matter.

The basic idea/motivation is how to penalize deviations. L1-norm does not care much about outliers, while L2-norm penalize these heavily. This is the basic difference and you will find a lot of pros and cons, even on wikipedia.

So in regards to your question if it makes sense when the expected deviations are small: sure, it behaves the same.

Let's make an example:

y_real 1.0      ||| y_pred 0.8     ||| y_pred 0.6 
l1:                |0.2| = 0.2         |0.4| = 0.4  => 2x times more error!
l2:                0.2^2 = 0.04        0.4^2 = 0.16 => 4x times more error!

You see, the basic idea still applies!