Search code examples
pythonrpandasrpy2

Rpy2: pandas dataframe can't fit in R


I need to read a csv file with python (into a pandas dataframe), work in R and return to python. Then, to pass pandas dataframe to R dataframe I use rpy2, and work ok (code bellow).

from pandas import read_csv, DataFrame
import pandas.rpy.common as com
import rpy2.robjects as robjects

r = robjects.r
r.library("fitdistrplus")

df = read_csv('./datos.csv')
r_df = com.convert_to_r_dataframe(df)
print(type(r_df))

And this output is:

<class 'rpy2.robjects.vectors.FloatVector'>

But then, I try to make a fit in R:

fit2 = r.fitdist(r_df, "weibull")

But I have this error:

RRuntimeError: Error in (function (data, distr, method = c("mle", "mme", "qme", "mge"),  : 
data must be a numeric vector of length greater than 1

I have 2nd question in this:
1_ What I do wrong?
2_ This is the most efficient way to pass a python dataframe to R? Because, I see this import: from rpy2.robjects.packages import importr

This is the data that I read: https://mega.co.nz/#!P8MEDSzQ!iQyxt73a5pRvJNOxWeSEaFlsVS7_A1sZCAXkUFBLJa0

I use Ipython 2.1 Thanks!


Solution

  • You have two issues:

    First, you are trying to use a data frame where you really need a vector. (If you tried using an R data.frame for fitdist(), you'd also get an error.)

    Second, the pandas<->rpy2 support provided by pandas is buggy, resulting in conversion of your (presumably) numeric pandas data frame to a string/character R data frame:

    In [27]: r.sapply(r_df, r["class"])
    Out[27]: 
    <StrVector - Python:0x1097757a0 / R:0x7fa41c6b0b68>
    [str, str, str, str]
    

    This is not good! The following code fixes these errors:

    from pandas import read_csv
    import rpy2.robjects as robjects
    
    r = robjects.r
    r.library("fitdistrplus")
    
    # this will read in your csv file as a Series, rather than a DataFrame
    series = read_csv('datos.csv', index_col=0, squeeze=True)
    
    # do the conversion directly, so that we get an R Vector, rather than a 
    # data frame, and we know that it's a numeric type
    r_vec = robjects.FloatVector(series)
    
    fit2 = r.fitdist(r_vec, "weibull")