Im playing with linear regression in azure machine learning and evaluating a model.
Im still a bit unsure what the various metrics for evaluation mean and show, so would appreciate some correction if i am incorrect.
Are these definitions and assumptions correct?
You are almost correct on most points. To make sure we are talking in the same terms, a little bit of background:
A linear regression uses data on some outcome variable y
and independent variables x1, x2, ..
and tries to find the linear combination of x1, x2, ..
that best predicts y
. Once this "best linear combination" is established, you can assess the quality of the fit (i.e. quality of the model) in multiple ways. The six points you mention are all key metrics for the quality of a regression equation.
Running a regression gives you multiple "ingredients". For example, every observation will get a predicted value for the outcome variable. The difference between the observed value of y
and the predicted value is called the residual or error. Residuals can be negative (if the y
is overestimated) and positive (if y
is underestimated). The closer the residuals are to zero, the better. But, what is "close"? The metrics you present are supposed to give an insight in this.
Relative Absolute Error: The absolute error as a fraction of the real value of the outcome variable y
. In your case, the predictions are on average 75% higher/lower than the actual value of y
.
Relative Squared Error: The squared error (residual^2
) as a fraction of the real value.
y
. In fact, in your case the independent variables can model 38,15% of the variation in y
. Also, if you have only one independent variable, this coefficient is equal to the squared correlation coefficient. Root Mean Squared Error and Coefficient of Determination are the most important metrics in nearly all situations. To be honest, I've never really seen the other metrics being reported.