I am trying to convert a python dataframe to r with rpy2 and I cannot get a date in python dataframe to be converted to a date type in r dataframes.
When converting a pd.to_datetime()
to r dataframe I am not getting a correct conversion.
df date columns in question:
event_time
0 2019-10-11
1 2020-01-01
2 2019-11-15
3 2020-03-05
Conversion code:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
Produces:
event_time: <class 'numpy.ndarray'>
array([737343., 737425., 737378., 737489.])
And the same thing for discharge_time.
Conversion code with string and then attempt to convert :
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
Produces a dataframe with:
event_time: <class 'numpy.ndarray'>
array(['2019-10-11', '2020-01-01', '2019-11-15', '2020-03-05'], dtype='<U10')
But this line of code r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.names.index('event_time')], '%Y-%m-%d'))
errors with:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
Using this code produces:
with localconverter(ro.default_converter + pandas2ri.converter):
df['event_time'] = pd.to_datetime(df['event_time']).dt.strftime("%Y-%m-%d")
#### df["event_time"] = pd.to_datetime(df["event_time"]).dt.date
r_df = ro.conversion.py2rpy(df)
r_df = base.cbind(r_df, event_time = base.as_Date(r_df[r_df.rx2('event_time')], '%Y-%m-%d'))
Error:
Conversion 'py2rpy' not defined for objects of type '<class 'numpy.ndarray'>'
So how do I get a date from a python dataframe into a date in r with rpy2? I need it in a date format because I will be doing date calculations later on and strings will not work.
Versions:
pandas==1.0.1
rpy2~=3.3.5
Your problem has nothing to do with rpy2, you are just parsing dates incorrectly in pandas. See:
from pandas import DataFrame, to_datetime
df = DataFrame(dict(event_time=['2019-10-11', '2020-01-01']))
df.event_time = to_datetime(df.event_time)
print(list(df.event_time))
# [Timestamp('2019-10-11 00:00:00'), Timestamp('2020-01-01 00:00:00')]
# you using dt.strftime you was just converting them back to strings, see:
print(list(df.event_time.dt.strftime("%Y-%m-%d")))
# ['2019-10-11', '2020-01-01', '2019-11-15']
# now you could extract date object (but don't! timestamps are fine for rpy2)
print(list(df.event_time.dt.date))
# [datetime.date(2019, 10, 11), datetime.date(2020, 1, 1)]
Now in rpy2 you simply do:
from rpy2.robjects import conversion, default_converter, pandas2ri
from rpy2.robjects.conversion import localconverter
with localconverter(default_converter + pandas2ri.converter):
df_r = conversion.py2rpy(df)
print(repr(df_r.rx2('event_time')))
# R object with classes: ('POSIXct', 'POSIXt') mapped to:
# [2019-10-11, 2020-01-01]
Now you can have fun with handling the dates on the R side, see dates. Also, if you happen to use Jupyter notebooks, conversion is much more handy using cell magics.