Given two random variables/measurements (x, y), both measured with error (error-in-variables case),
is there a routine in MATLAB to calculate the estimators (a, b) of a regression line y(i)=a·x(i)+b using the method of orthogonal distance regression?
Here's my implementation of Maximum Likelihood estimator:
x= [1.0, 0.6, 1.2, 1.4, 0.2];
y=[0.5, 0.3, 0.7, 1.0, 0.2];
mx = mean(x);
my = mean(y);
p = (x(:) - mx) .^ 2;
q = (y(:) - mx) .^ 2;
w = p .* q;
sxx = sum(p);
syy = sum(q);
sxy = sum(w); w=p.*q; sxy=sum(w);
l = 1; %# orthogonal distance regression
a = (syy - l * syy + sqrt((syy - l * sxx) ^ 2 + 4 * l * sxy^2)) / (2 * sxy);
b = my - a * mx;
EDIT (addressed to EitanT):
Here's a comparison of my estimators and yours:
MATLAB doesn't have a built-in function exactly like that, but you can easily find the estimators a and b with svd
approximation1,2:
data = [x(:), y(:)];
[U, S, V] = svd(data - repmat(mean(data), size(data, 1), 1), 0);
a = -V(1, end) / V(2, end);
b = mean(data * V(:, end)) / V(2, end);
Which is in fact the orthogonal distance regression method.
EDIT #1:
Here's a plot of the original data, alongside my estimator and yours.
Your estimator is highly inaccurate, which brings me to believe that that your implementation is flawed.
EDIT #2:
Here's an updated plot if the computation of a
is corrected to:
a=(syy-l*syy+sqrt((syy-l*sxx)^2+4*l*sxy^2)) / (2*sxy); %# Forgot parentheses!
Closer, but still not as accurate as mine.
EDIT #1:
You can further improve the accuracy of sxx
, syy
and sxy
, like so:
cov_mat = cov(x, y);
sxx = cov_mat(1, 1); %# Same as: sxx = var(x);
syy = cov_mat(2, 2); %# Same as: syy = var(y);
sxy = cov_mat(1, 2); %# Same as: sxy = cov_mat(2, 1);
1 Gene H. Golub and Charles F. Van Loan (1996) "Matrix Computations" (3rd ed.). The Johns Hopkins University Press. pp 596.
2 http://en.wikipedia.org/wiki/Total_least_squares