Search code examples
pythonnumpyscikit-learngoodness-of-fit

Significant mismatch between `r2_score` of `scikit-learn` and the R^2 calculation


Question

Why is there a significant difference between the r2_score function in scikit-learn and the formula for the Coefficient of Determination as described in Wikipedia? Which is the correct one?


Context

I'm using with Python 3.5 to predict linear and quadratic models, and one of the measures of goodness of fit that I'm trying out is the . However, while testing, there's a marked difference between the r2_score metric in scikit-learn and the calculation provided in Wikipedia.


Code

I'm providing my code here as reference, which computes the example in the Wikipedia page linked above.

from sklearn.metrics import r2_score
import numpy

y = [1, 2, 3, 4, 5]
f = [1.9, 3.7, 5.8, 8.0, 9.6]

# Convert to numpy array and ensure double precision to avoid single precision errors
observed = numpy.array(y, dtype=numpy.float64)
predicted = numpy.array(f, dtype=numpy.float64)

scipy_value = r2_score(observed, predicted)

>>> scipy_value: 

As is evident, the scipy calculated value is -3.8699999999999992while the reference value in Wikipedia is 0.998.

Thank you!

UPDATE: This is different from this question about how R^2 is calculated in scikit-learn as what I'm trying to understand and have clarified is the discrepancy between both results. That question states that the formula used in scikit is the same as Wikipedia's which should not result in different values.

UPDATE #2: It turns out I made a mistake reading the Wikipedia article's example. Answers and comments below mention that the example I provide is for the linear, least squares fit of the (x, y) values in the example. For that, the answer in Wikipedia's article is correct. For that, the R^2 calue provided is 0.998. For the R^2 between both vectors, scikit's answer is also correct. Thanks a lot for your help!


Solution

  • The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as sklearn:

    In [85]: import numpy as np
    
    In [86]: y = [1,2,3,4,5]
    
    In [87]: f = [1.9, 3.7, 5.8, 8.0, 9.6]
    
    In [88]: SSres = sum(map(lambda x: (x[0]-x[1])**2, zip(y, f)))
    
    In [89]: SStot = sum([(x-np.mean(y))**2 for x in y])
    
    In [90]: SSres, SStot
    Out[90]: (48.699999999999996, 10.0)
    
    In [91]: 1-(SSres/SStot)
    Out[91]: -3.8699999999999992
    

    The idea behind a negative value is that you'd have been closer to the actual values had you just predicted the mean each time (which would correspond to an r2 = 0).