Search code examples
matlablinear-regressionleast-squares

MATLAB fitlm: OLS vs Robust regression


I am trying to calculate a linear regression of some data that I have using MATLAB's fitlm tool. Using ordinary least-squares (OLS) I get fairly low R-squared values (~ 0.2-0.5), and occasionally even unrealistic results. Whereas when using robust regression (specifically the 'talwar' option), I get much better results (R2 ~ 0.7-0.8).

I am no statistician, so my question is: Is there any reason I should not believe that the robust results are better?

Here is an example of some of the data. The data shown produces R2 of OLS: 0.56, robust:0.72.

enter image description here


Solution

  • One reason you're going to get notable differences in R values is that the Talwar handles outliers differently. Talwar subdivides your data set into segments and computes averages for each of those segments.

    Taken from the abstract of Talwar's paper:

    'Estimates of the parameters of a linear model are usually obtained by the method of ordinary least-squares (OLS), which is sensitive to large values of the additive error term... we obtain a simple, consistent and asymptotically normal initial estimate of the coefficients, which protects the analyst from large values of εi which are often hard to detect using OLS on a model with many regressors. '- https://www.jstor.org/stable/2285386?seq=1#page_scan_tab_contents

    Whether Talwar or OLS is better depends on your knowledge of the measurement process (namely, how outliers can be explained). If appropriate, and you prune the data with a Q-test to remove outliers ( see http://education.mrsec.wisc.edu/research/topic_guides/outlier_handout.pdf), that should minimize the differences in R you see between Talwar and OLS.