Search code examples
rpandasrpy2

Minimal example of rpy2 regression using pandas data frame


What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily complicated?

The R code, for comparison:

x <- c(1,2,3,4,5)
y <- c(2,1,3,5,4)
M <- lm(y~x)
summary(M)$coefficients
            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

Now, my python (2.7.10), rpy2 (2.6.0), and pandas (0.16.1) version:

import pandas
import pandas.rpy.common as common
from rpy2 import robjects
from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')

dataframe = pandas.DataFrame({'x': [1,2,3,4,5], 
                              'y': [2,1,3,5,4]})

robjects.globalenv['dataframe']\
   = common.convert_to_r_dataframe(dataframe) 

M = stats.lm('y~x', data=base.as_symbol('dataframe'))

print(base.summary(M).rx2('coefficients'))

            Estimate Std. Error  t value  Pr(>|t|)
(Intercept)      0.6  1.1489125 0.522233 0.6376181
x                0.8  0.3464102 2.309401 0.1040880

By the way, I do get a FutureWarning on the import of pandas.rpy.common. However, when I tried the pandas2ri.py2ri(dataframe) to convert a dataframe from pandas to R (as mentioned here), I get

NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'

Solution

  • The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

    Otherwise, the conversion shipping with rpy2 appears to be working here:

    from rpy2.robjects import pandas2ri
    pandas2ri.activate()
    robjects.globalenv['dataframe'] = dataframe
    M = stats.lm('y~x', data=base.as_symbol('dataframe'))
    

    The result:

    >>> print(base.summary(M).rx2('coefficients'))
                Estimate Std. Error  t value  Pr(>|t|)
    (Intercept)      0.6  1.1489125 0.522233 0.6376181
    x                0.8  0.3464102 2.309401 0.1040880