Search code examples
matlablinedata-modelingcorrelationdata-fitting

Linear Correlation of 1 Leaving Residual Values in MATLAB


I have a set of data, X and Y, which I wish to fit a line to, get the R and R^2 values for, and also graph the residuals from the difference between the best fit line values and the actual data. Here's my MATLAB code that does this:

maxx = max(X); minx = min(X);
fitx = minx:maxx / 1000:maxx;
coeff = polyfit(X,Y,1);
fity = polyval(coeff,fitx);

temp = corrcoef(X,Y); 
R = temp(2); R_squared = R^2;

ysub = polyval(coeff,X); 
residuals = Y - ysub;

subplot(1,2,1);
plot(X,Y,'+',fitx,fity,'r')
xlabel(['R = ' num2str(R) '; R^2 = ' num2str(R_squared)]);

subplot(1,2,2);
bar(residuals);

So I tested it on what should be an "ideal" dataset that could fit a line perfectly, and sure enough, I get an R and an R^2 value of 1, and my first plot looks fine, but my residuals are ranging from 7000 to -3000. Shouldn't my residuals be 0 if my R values are 1?

What am I misunderstanding here?

Here is the sample dataset:

X = [100 200 290 390 480 580 670 760 860 950]
Y = 1.0e+07 * [0.2429 0.4929 0.7183 0.9689 1.1946 1.4453 1.6711 1.8968 2.1477 2.3735]

Solution

  • It would be easier to diagnose with a sample dataset.

    At a guess, the problem is that your first line should be:

    maxx = max(X); minx = min(X);
    

    The way you had it minx=min(Y) distorts your fitx and fity values

    Edit:

    Thank you for submitting the sample data. What you are seeing now is only rounding errors. Your R isn't actually 1, it's just really close. Try:

     R-1
    

    The result for your data is -1.0301e-07, indicating that the correlation isn't quite perfect. If R was exactly 1, then you are correct that the residuals would be zero. Your residuals are pretty small given the size of your data ( < 0.3% for the first point and at least 10x smaller for the rest) and are consistent with your measured correlation coefficient.

    I think everything is working correctly.