machine-learning statistics artificial-intelligence linear-regression statsmodels

In OLS, why is squaring preferred over taking absolute while calculating errors in linear regression?

Why do we use the squared residuals instead of the absolute residuals in OLS estimation.

My idea was that we use the square of the error values, so that residuals below the fitted line (which are then negative), would still have to be able to be added up to the positive errors. Otherwise, we could have an error of 0 simply because a huge positive error could cancel with a huge negative error.

So why do we square it, instead of just taking the absolute value? Is that because of the extra penalty for higher errors (instead of 2 being 2 times the error of 1, it is 4 times the error of 1 when we square it).

Solution

I feel that large negative residuals (i.e., points far below the line) are as bad as large positive ones (i.e., points that are high above the line). By squaring the residual values, we treat positive and negative discrepancies in the same way. Why do we sum all the squared residuals? Because we cannot find a single straight line that minimizes all residuals simultaneously. Instead, we minimize the average (squared) residual value.