Search code examples
pythonrpandasdataframereticulate

Why is a dataframe from a dictionary behaving differently in R from a standalone dataframe via reticulate?


I am using reticulate from within R and attempting to convert a pandas dataframe from a dictionary, to an R dataframe, but I am finding that the conversion does not work and I'm not sure why? I want to be able to access the dataframe columns using R syntax (i.e. $). When I generate a standalone dataframe in Python and return it to R I have no problems.

in Python


def createDataFrame(x):

        a=(x,x)
        b=pd.DataFrame(np.ones(a)*x)
        return b

def createDictionary(x):
    dict1={}

    a=(x,x)
    b=pd.DataFrame(np.ones(a)*x)

    dict1['test'] = pd.DataFrame(b)


    return dict1

df  = createDataFrame(3)
Dict = createDictionary(3)

in R using reticulate package

source("py_script.py")
df$'1' 

R_Df <- Dict$test
R_Df$'1'

I would expect df$'1' and R_df$'1' to generate the same output, a column vector from the relevent data frame. But I don't get anything back from the second call - instead I get the following:

error generated

Error in py_get_attr_impl(x, name, silent) : AttributeError: 'DataFrame' object has no attribute '1'

Could anyone explain why this is and a way of manipulating objects from dictionaries in R? Thanks in advance


Solution

  • As i look the data, R_df has class shown below

    > class(R_Df)
    [1] "pandas.core.frame.DataFrame"       
    [2] "pandas.core.generic.NDFrame"       
    [3] "pandas.core.base.PandasObject"     
    [4] "pandas.core.base.StringMixin"      
    [5] "pandas.core.accessor.DirNamesMixin"
    [6] "pandas.core.base.SelectionMixin"   
    [7] "python.builtin.object"   
    

    This class doesn't support in R dataframe.

    I will suggest two method.

    Method 1. tricky method

    You should use tricky method using rjson

    reticulate::source_python('code.py')
    library(rjson)
    R_Df <- data.frame(Dict$test)
    library(data.table)  # I used data.table library cause of column name.
    R_Df = rbindlist(lapply(fromJSON(R_Df$to_json(orient='records')), as.data.table))
    > R_Df$'1'
    [1] 3 3 3
    

    Method 2. use new version

    Developer merged support for converting to and from Pandas data frames onto master. You can use it download github manually.

    devtools::install_github("rstudio/reticulate")
    library(reticulate)
    reticulate::source_python('code.py')
    R_df<-Dict$test
    > R_Df$'1'
    [1] 3 3 3