Search code examples
rplotggplot2ggvis

ggvis: Combine multiple datasets in single plot


I have read a similar post on SO, but was not able to adapt the answer to my specific case. I am working with time series data and would like to combine two different data sets into the same plot. Although I could combine the data into one dataframe, I am really interested in understanding how to reference multiple datasets.

Mock Data:

require(ggvis)

dfa <- data.frame(
date_a = seq(from= as.Date("2015-06-10"), 
        to= as.Date("2015-07-01"), by= 1),
val_a = c(2585.150, 2482.200, 3780.186, 3619.601, 
        0.000, 0.000, 3509.734, 3020.405, 
        3271.897, 3019.003, 3172.084, 0.000, 
        0.000, 3319.927, 2673.428, 3331.382, 
        3886.957, 2859.887, 0.000, 0.000, 
        2781.443, 2847.377) )

dfb <- data.frame(
date_b = seq(from= as.Date("2015-07-02"), 
        to= as.Date("2015-07-15"), by= 1),
val_b = c(3250.75429, 3505.43477, 3208.69141,
        -2.08175, -27.30244, 3324.62348, 
        2820.91075, 3250.75429, 3505.43477,
        3208.69141, -2.08175, -27.30244,
        3324.62348, 2820.91075) )

Using the data provided above, I am able to create separate plots with the code below:

Separate Plots: (Works)

dfa %>%
ggvis( x= ~date_a , y= ~val_a, stroke := "black", opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"),
    as.Date("2015-07-15") )) %>%
    layer_lines() %>% layer_points( fill := "black" )

dfb %>%
ggvis( x= ~date_b , y= ~val_b, stroke := "red", opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"),
    as.Date("2015-07-15") )) %>%
    layer_lines() %>% layer_points( fill := "red" )

The desired output is these two lines (black and red) to be on the same plot. Here are a couple of unsuccessful attempts:

Attempt #1 adapted from SO post:

ggvis( data = dfa, x = ~date_a, y = ~val_a) %>% layer_lines(stroke := "black",  opacity := 0.5 ) %>%
    layer_lines( data = dfb, x= ~date_b , y= ~val_b, stroke := "red", 
    opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), 
    as.Date("2015-07-15") )) 

## Error in new_prop.default(x, property, scale, offset, mult, env, event,  : 
##  Unknown input to prop: c(16618, 16619, 16620, 16621, 16622, 16623, 16624, ...

Attempt #2 based on RStudio documentation:

ggvis( data = NULL, x = ~date_a, y = ~val_a) %>%
    layer_lines(stroke := "black",  opacity := 0.5, data = dfa ) %>%
    layer_lines( x= ~date_b , y= ~val_b, stroke := "red", 
    opacity := 0.5, data = dfb ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), 
    as.Date("2015-07-15") )) 

## Error in func() : attempt to apply non-function

Here is a minimalistic implementation in ggplot2:

require(ggplot2)

ggplot() + 
  geom_line(data = dfa, aes(x = date_a, y = val_a ), colour = "black") +     
  geom_line(data = dfb, aes(x = date_b, y = val_b ), colour = "red") 

ggplot example

Again, a working solution and brief explanation would be greatly appreciated. Thank you in advance for the help.


Solution

  • Well, it looks like layer_lines may not properly by taking the data argument. I think you can successfully use layer_paths here. They work similarly, but layer_paths works in the order of the data so you'd need to make sure your time series are arranged correctly before plotting.

    First, when I look at the layer_paths basic function it, like many other layer functions, has a specific data argument.

    layer_paths
    function (vis, ..., data = NULL) 
    {
        add_mark(vis, "line", props(..., env = parent.frame()), data, 
            deparse2(substitute(data)))
    }
    <environment: namespace:ggvis>
    

    While layer_lines has the ... for more arguments, it doesn't have a data argument and it doesn't seem like things work with it.

    layer_lines
    function (vis, ...) 
    {
        x_var <- vis$cur_props$x$value
        layer_f(vis, function(x) {
            x <- auto_group(x, exclude = c("x", "y"))
            x <- dplyr::arrange_(x, x_var)
            emit_paths(x, props(...))
        })
    }
    <environment: namespace:ggvis>
    

    To test, I made a really basic graph, trying to use the data argument in layer_lines.

    ggvis() %>%
        layer_lines(data = dfb, x= ~date_b , y= ~val_b, stroke := "red") 
    

    This fails with an error.

    Error in func() : attempt to apply non-function

    Here's the same code using layer_paths instead:

    ggvis() %>%
        layer_paths(data = dfb, x= ~date_b , y= ~val_b, stroke := "red") 
    

    enter image description here

    So, that works, which means as long as you order your dataset by your dates your graphic should work fine by just replacing layer_lines with layer_paths.

    ggvis(data = dfa, x = ~date_a, y = ~val_a) %>% 
        layer_paths(stroke := "black",  opacity := 0.5 ) %>%
        layer_paths(data = dfb, x = ~date_b , y= ~val_b, stroke := "red", 
                    opacity := 0.5 ) %>% 
        scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), as.Date("2015-07-15") )) 
    

    enter image description here

    This seemed odd to me, and I have missed something. I didn't see anything in the open or closed issues on the ggvis github page and you might consider filing one.