Search code examples
rpython-3.xpandasdataframerpy2

Use rpy2 with pandas dataframe


I want to aplly some R function to a pandas dataframe

df = pd.DataFrame( np.random.randn(5,2), # 5 rows, 2 columns
               columns = ["A","B"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )

How can I apply for example the summary dunction from R?

I have the following code:

import numpy as np
import pandas as pd

import rpy2
# print(rpy2.__version__) ## 2.9.4

from rpy2.rinterface import R_VERSION_BUILD
# print(R_VERSION_BUILD) ## ('3', '5.1', '', 74947)

from rpy2.robjects.packages import importr
# import R's "base" package
base = importr('base')

Solution

  • You are almost there. In order to run R functions, you need to convert the pandas Dataframe to R Dataframe. Once we have the R object we can call the functions as shown below.

    import rpy2
    from rpy2.robjects.packages import importr # import R's "base" package
    base = importr('base')
    
    from rpy2.robjects import pandas2ri # install any dependency package if you get error like "module not found"
    pandas2ri.activate()
    
    # Create pandas df
    df = pd.DataFrame( np.random.randn(5,2), # 5 rows, 2 columns
                   columns = ["A","B"], # name of columns
                   index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )
    
    # Convert pandas to r
    r_df = pandas2ri.py2ri(df)
    type(r_df)
    
    #calling function under base package
    print(base.summary(r_df))