Search code examples
machine-learningstatisticsgradient-descent

L1-norm vs l2-norm as cost function when standardizing


I have some data where both the input and the output values are standardized, so the difference between Y and Y_pred is always gonna very small.

I feel that the l2-norm will penalize less the model than the l1-norm since squaring a number that is between 0 and 1 will always result in a lower number.

So my question is, is it ok to use the l2-norm when both the input and the output are standardized?


Solution

  • It does not matter.

    The basic idea/motivation is how to penalize deviations. L1-norm does not care much about outliers, while L2-norm penalize these heavily. This is the basic difference and you will find a lot of pros and cons, even on wikipedia.

    So in regards to your question if it makes sense when the expected deviations are small: sure, it behaves the same.

    Let's make an example:

    y_real 1.0      ||| y_pred 0.8     ||| y_pred 0.6 
    l1:                |0.2| = 0.2         |0.4| = 0.4  => 2x times more error!
    l2:                0.2^2 = 0.04        0.4^2 = 0.16 => 4x times more error!
    

    You see, the basic idea still applies!