Search code examples
pythonrrpy2

Unexpected behavior using as.Date() in rpy2


I have some code in Python that generates a dataframe with dates based on an R package (timeDate), but I cannot get the date/time converted to a date only (to exclude the time portion).

I have tried using as.Date and instead of generating a date, it generates a number. I ran the same code in R directly (ensuring it's the one that rpy2 is using) and it correctly generates dates. But In rpy2, it doesn't work. The R version is 3.5.1 and Python is 3.7.0.

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()

ro.r('library(timeDate)')
ro.r('df <- data.frame(holiday="USChristmasDay", date=USChristmasDay(2010:2025))')
ro.r("names(df) <- c('holiday', 'date')")
ro.r("df$year = as.integer(format(df$date, '%Y'))")
ro.r("df$date_str = format(df$date, '%Y-%m-%d')")
ro.r("df$date_dt = as.Date(df$date, '%Y-%m-%d')")
ro.r("df$date_dt2 = as.Date(df$date_str, '%Y-%m-%d')")
ro.r("df$date_dt3 = as.Date(format(df$date, '%Y-%m-%d'), '%Y-%m-%d')")
ro.r("df$date_dt4 = as.Date(df$date, tz='America/Los_Angeles')")
df = ro.r('df')
print(df)

USChristmasDay is a function that returns a vector of all the dates of Christmas between the years specified.

The date_str correctly comes out formatted as specified. When running in R directly, I get dates that appear like 2025-12-25 in all four dt columns. I expect the same in python with rpy2, but instead I get numbers like 20447.0 in all four columns.


Solution

  • Since R code worked, use it directly inside Python and not the ro interface.

    from rpy2.robjects.packages import importr
    
    base = importr("base")
    timeDate = import("timeDate")
    
    df = base.data_frame(holiday="USChristmasDay", date=timeDate.USChristmasDay(base.seq(2010,2025)))
    
    df = base.cbind(df, year = base.as_integer(base.format(df[df.names.index('date')], "%Y")))
    df = base.cbind(df, date_str = base.format(df[df.names.index('date')], '%Y-%m-%d'))
    df = base.cbind(df, date_dt = base.as_Date(df[df.names.index('date')], '%Y-%m-%d'))
    df = base.cbind(df, date_dt2 = base.as_Date(df[df.names.index('date_str')], '%Y-%m-%d'))
    df = base.cbind(df, date_dt3 = base.as_Date(base.format(df[df.names.index('date')], '%Y-%m-%d'), '%Y-%m-%d'))
    df = base.cbind(df, date_dt4 = base.as_Date(df[df.names.index('date')], tz='America/Los_Angeles'))
    
    print(df)
    

    Additionally, the number conversion appears to be due to using as.Date which does not include the time component. Instead use the datetime types: as.POSIXct or non-time zone version, as.POSIXlt. These appear to translate properly with ro and pandas2ri.