Search code examples
pythonrggplot2rpy2

rpy2 ggplot2 Error: Invalid input: date_trans works with objects of class Date only


I am trying to use the rpy2 package to call ggplot2 from a python script to plot time series data. I get an error when I try to adjust the date limits of the x-scale. The rpy2 documentation provides this guidance (https://rpy2.readthedocs.io/en/version_2.8.x/vector.html?highlight=date%20vector): "Sequences of date or time points can be stored in POSIXlt or POSIXct objects. Both can be created from Python sequences of time.struct_time objects or from R objects."

Here is my example code:

import numpy as np
import pandas as pd
import datetime as dt
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
import rpy2.robjects.lib.ggplot2 as ggplot2
pandas2ri.activate()

#Create a random dataframe with time series data
df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
                  'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
                           dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
                           dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
                           dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
                           dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})

#Create a POSIXct vector from time.struct_time objects to store the x limits
date_min = dt.datetime(2000, 1, 1).timetuple()
date_max = dt.datetime(2010, 1, 1).timetuple()
date_range = ro.vectors.POSIXct((date_min, date_max))

#Generate the plot
gp = ggplot2.ggplot(df)
gp = (gp + ggplot2.aes_string(x='Time', y='Data') +
      ggplot2.geom_point() +
      ggplot2.scale_x_date(limits=date_range))

When I run this code, I get the following error message:

Error: Invalid input: date_trans works with objects of class Date only

Instead of the POSIXct object, I have also tried the DateVector object. I have also tried using base.as_Date() to convert date strings into R dates and feeding those into the R vector objects. I always get the same error message. In R, I would change the scale limits like this:

gp + scale_x_date(limits = as.Date(c("2000/01/01", "2010/01/01"))

How do I translate this into rpy2 so that my python script will run?


Solution

  • Consider running base R functions like you do in R which you can import as a library in rpy2. FYI - in R sessions base, stats, utils and other built-in libraries are implicitly loaded without library lines.

    Datetime Processing

    Also, convert Python datetime objects to string with strftime instead of timetuple() to translate easier.

    base = importr('base')
    ...
    date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
    date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
    date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")
    ...
    ggplot2.scale_x_datetime(limits=date_range))
    

    GGPlot Plus Operator

    Additionally, the + Python operator is not quite the same as ggplot2's which is really: ggplot2:::`+.gg`. As pointed out in this SO post, How is ggplot2 plus operator defined?, this function conditionally runs add_theme() or add_ggplot() which you need to replicate in Python. Because the above R function is a local namespace not readily available at ggplot2.* calls, use R's utils::getAnywhere("+.gg") to import the function as a user-defined method.

    Consequently, you need to convert the + with actual qualified calls for Python's object model. And you can do so with base R's Reduce. So the following in R:

    gp <- ggplot(df)
    gp <- gp + aes_string(x='Time', y='Data') +
      geom_point() +
      scale_x_datetime(limits=date_range)
    

    Translates equivalently as

    gp <- Reduce(ggplot2:::`+.gg`, list(ggplot(df), aes_string(x='Time', y='Data'), 
                                        geom_point(), scale_x_datetime(limits=date_range)))
    

    Or with getAnywhere() after ggplot2 library is loaded in session:

    gg_proc <- getAnywhere("+.gg")
    
    gp <- Reduce(gg_proc$objs[[1]], list(ggplot(df), aes_string(x='Time', y='Data'), 
                                         geom_point(), scale_x_datetime(limits=date_range)))
    

    Rpy2

    Below is the full code in rpy2. Because you run R objects layered in Python script non-interactively, plots will not show to screen and will need to be saved which can be achieved with ggsave:

    import numpy as np
    import pandas as pd
    import datetime as dt
    
    from rpy2.robjects import pandas2ri
    from rpy2.robjects.packages import importr
    
    # IMPORT R PACKAGES
    base = importr('base')
    utils = importr('utils')
    ggplot2 = importr('ggplot2')
    
    pandas2ri.activate()
    
    # CREATE RANDOM (SEEDED) DATAFRAME WITH TIME SERIES DATA
    np.random.seed(6252018)
    df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
                       'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
                                dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
                                dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
                                dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
                                dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})
    
    # CONVERT TO POSIXct VECTOR
    date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
    date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
    date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")
    
    # RETRIEVE NEEDED FUNCTION
    gg_plot_func = utils.getAnywhere("+.gg")
    
    # PRODUCE PLOT
    gp = base.Reduce(gg_plot_func[1][0], base.list(ggplot2.ggplot(df),
                                                   ggplot2.aes_string(x='Time', y='Data'),
                                                   ggplot2.geom_point(),
                                                   ggplot2.scale_x_datetime(limits=date_range)))
    # SAVE PLOT TO DISK
    ggplot2.ggsave(filename="myPlot.png", plot=gp, device="png", path="/path/to/plot/output")
    

    Output (rendered in Python)

    Plot Output