Search code examples
pythonrrpy2

Conversion python dataframe character column to r with rpy2


I am trying to convert a python dataframe to r with rpy2 and I cannot get a date in python dataframe to be converted to a date type in r dataframes.

When converting a pd.to_datetime() to r dataframe I am not getting a correct conversion.

df date columns in question:

     event_time
0    2019-10-11
1    2020-01-01
2    2019-11-15
3    2020-03-05

Conversion code:

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

Produces:

event_time: <class 'numpy.ndarray'>
  array([737343., 737425., 737378., 737489.])

And the same thing for discharge_time.

Conversion code with string and then attempt to convert :

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    #### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

    r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))

Produces a dataframe with:

event_time: <class 'numpy.ndarray'>
  array(['2019-10-11', '2020-01-01', '2019-11-15', '2020-03-05'], dtype='<U10')

But this line of code r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d')) errors with:

AttributeError: 'numpy.ndarray' object has no attribute 'index'

Using this code produces:

with localconverter(ro.default_converter + pandas2ri.converter):

    df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
    #### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
    r_df = ro.conversion.py2rpy(df)

    r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.rx2('event_time')], '%Y-%m-%d'))

Error:

Conversion 'py2rpy' not defined for objects of type '<class 'numpy.ndarray'>'

So how do I get a date from a python dataframe into a date in r with rpy2? I need it in a date format because I will be doing date calculations later on and strings will not work.

Versions:

pandas==1.0.1

rpy2~=3.3.5


Solution

  • Your problem has nothing to do with rpy2, you are just parsing dates incorrectly in pandas. See:

    from pandas import DataFrame, to_datetime
    
    df = DataFrame(dict(event_time=['2019-10-11', '2020-01-01']))
    
    df.event_time = to_datetime(df.event_time)
    
    print(list(df.event_time))
    # [Timestamp('2019-10-11 00:00:00'), Timestamp('2020-01-01 00:00:00')]
    
    # you using dt.strftime you was just converting them back to strings, see:
    print(list(df.event_time.dt.strftime("%Y-%m-%d")))
    # ['2019-10-11', '2020-01-01', '2019-11-15']
    
    # now you could extract date object (but don't! timestamps are fine for rpy2)
    print(list(df.event_time.dt.date))
    # [datetime.date(2019, 10, 11), datetime.date(2020, 1, 1)]
    

    Now in rpy2 you simply do:

    from rpy2.robjects import conversion, default_converter, pandas2ri
    from rpy2.robjects.conversion import localconverter
    
    
    with localconverter(default_converter + pandas2ri.converter):
        df_r = conversion.py2rpy(df)
    
    print(repr(df_r.rx2('event_time')))
    # R object with classes: ('POSIXct', 'POSIXt') mapped to:
    # [2019-10-11, 2020-01-01]
    

    Now you can have fun with handling the dates on the R side, see dates. Also, if you happen to use Jupyter notebooks, conversion is much more handy using cell magics.