Search code examples
rggplot2time-seriestimeserieschart

Adding multiple vlines for different dates in timeseries data


I'm trying to plot a line chart with multiple time series data, where each line shows the sales trends over time for for a specific object - within the same start date and end date. My dataset is already in a "melted" form and looks like this:

'data.frame':   468 obs. of  3 variables:
 $ date                : Date, format: "2019-04-11" "2019-04-12" "2019-04-13" ...
 $ Object                : chr  "Object1" "Object2" "Object3" "Object 4" ...
 $ daily_sales: int  1 257 178 177 255 240 231 214 193 174 ...

I have a set of dates for which I need vertical lines, and they're stored in a Date array, imp.dates

When I try to plot a single vline it works fine (with the following code):

ggplot(df, aes(x=date,
               y=daily_sales,
               colour=Object,
               group=Object)) +
  geom_line() + 
  geom_vline(aes(xintercept=imp.dates[1]),
            linetype=4,
            colour="black")

However, when I try to do multiple vlines

ggplot(df, aes(x=date,
                   y=daily_sales,
                   colour=Object,
                   group=Object)) +
      geom_line() + 
      geom_vline(aes(xintercept=imp.dates),
                linetype=4,
                colour="black")

I get the following error:

Error: Aesthetics must be either length 1 or the same as the data (40): xintercept

The following are SO posts that I've looked at to no avail: 1. Multiple vlines in plot gives error, ggplot2 2. ggplot2: how to add text to multiple vertical lines (geom_vlines) on a time x-axis? 3. How to get a vertical geom_vline to an x-axis of class date?

3 comes very close, but my x variable is of class Date and not int, so can't seem to get it to work.

Any help will be appreciated.


Solution

  • You need to put imp.dates into a data frame and change the data for the geom_vline() layer.

    Here is some example data:

    set.seed(2867)
    df <- expand.grid(date = seq(as.Date("2019-01-01"), as.Date("2019-12-31"), by = 1L),
                      object = paste0("object", 1:4))
    df <- transform(df, daily_sales = rpois(nrow(df), lambda = 100))
    
    set.seed(1)
    imp <- data.frame(date = sample(unique(df$date), 4))
    

    Here I just selected randomly 4 dates from the series as the important ones, you could do:

    imp <- data.frame(date = imp.dates)
    

    It is helpful to have the second data frame use the same variable name date but I don't believe this is necessary, just helpful/easier to parse in your head.

    Now we build up the plot as you had it (note I changed Object to object in my code)

    ggplot(df, aes(x = date, y = daily_sales, colour = object, group = object)) +
      geom_line() + 
      geom_vline(data = imp,               ## 1
                 aes(xintercept = date),   ## 2
                 linetype = 4, colour = "black")
    

    Note that in line ##1 where we add the geom_vline() layer, we set the data argument for the layer to be our data frame of important dates, imp. In line ## 2 we specify the name of the variable in imp that contains the dates we want to draw. The rest of the plotting code is just as you had it.

    This produces (a mess, it's random data):

    enter image description here

    but it now includes the 4 selected important dates represented as vertical lines.