Search code examples
pythonnumpyscipy

Symmetry of Best Fit Straight Line on Inverting Axes


I have a set of data that I have created a scatter plot from. On top of this I overlay the best fit straight line. Everything was fine until I realised that because of the nature of the data, it made more conceptual sense if the y-axis data was plotted on the x-axis. So I inverted the axes

Say a1 and b1 are the intercept and slope of the original best fit line. Say a2 and b2 are the intercept and slope after inverting and reperforming the best fit

Given that the original slope, b1, was approximately 1, I would expect that a2 would be approximately -a1

And I would expect that the small difference between b1 and 1 would be (very approximately) opposite in sign to the small difference between b2 and 1

Originally I was performing the best fit using numpy.polyfit with deg=1 (I learned this method is obsolete as part of my investigation)

before axes inversion np.polyfit gave intercept of 0.016 and slope of 1.005
 after axes inversion np.polyfit gave intercept of 0.002 and slope of 0.75

So the intercept and slope of the new best fit line do not have the properties I expected

I then switched to stats.linregress

before axes inversion stats.linregress gave intercept of 0.016 and slope of 1.005
 after axes inversion stats.linregress gave intercept of 0.002 and slope of 0.75

The two algorithms are in agreement, meaning the mistake is on my side

As I see it, the possibilities of what is happening are the following; 1) my assumptions about the symmetry of of an axes inversion are incorrect, 2) there are additional properties I should be passing to the best fit algorithm that allow it to widen the range of the best fit properties, 3) some mysterious third thing

So which is it?


Solution

  • If you try to predict the value of y for given values of x, you fit a function y = f(x) by minimizing the errors over the y axis.

    You could also want to predict the value of x for given values of y. In this case you would fit a function x = g(y) by minimizing the error over the x axis.

    These are two different problems, and, in general, f and g are not the inverse of each other.