Search code examples
rggplot2data-visualizationline-plot

How can I automatically highlight multiple sections of the x axis in ggplot2?


I have a line plot that tracks counts over time for multiple factors. A mock version of the data I am working with would be:

step   factor   count
1      a        10
1      b        0
1      c        5
2      a        5
2      b        10
2      c        0
... etc.

The counts are influenced by an external event, and for each step I know whether that event is happening or not. This information could either be in a different dataframe or in the same one, it doesn't really matter, and it would look like this:

step   event
1      FALSE
2      FALSE
...
10     TRUE
11     TRUE
...
30     FALSE
... etc.

I am writing this script to automate the plot creation since I will be dealing with lots of data, and while I know I could use geom_rect() to hard-code highlighting rectangles, it is absolutely not something that I could do manually without wasting way too much time, especially considering the event can turn on and off at different steps in different instances.

Is there any way that I can extract the x limits for geom_rect() dynamically from the data and create as many rectangles as the data set needs? Or is this completely hopeless?


Solution

  • This may be a bit hacky, but I think it gives the result you are looking for. Let me create some data first that roughly corresponds to yours:

    df <- data.frame(step = rep(1:100, 3), group = rep(letters[1:3], each = 100),
                     value = c(cumsum(c(50, runif(99, -1, 1))), 
                               cumsum(c(50, runif(99, -1, 1))),
                               cumsum(c(50, runif(99, -1, 1)))))
    
    df2 <- data.frame(step = 1:100, event = sample(c(TRUE, FALSE), 100, TRUE))
    

    So the starting plot from df would look like this:

    ggplot(df, aes(step, value, colour = group)) + geom_line()
    

    enter image description here

    and the event data frame looks like this:

    head(df2)
    #>   step event
    #> 1    1 FALSE
    #> 2    2 FALSE
    #> 3    3 FALSE
    #> 4    4  TRUE
    #> 5    5 FALSE
    #> 6    6  TRUE
    

    The idea is that you add a semi-transparent red geom_area to the plot, making FALSE values way below the bottom of the range and TRUE values way above the top of the range, then just set coord_cartersian so that the y limits are near to the limits of your main data. This will give you red vertical bands whenever your event is TRUE:

    ggplot(df, aes(step, value, colour = group)) + 
      geom_line() + 
      geom_area(data = df2, aes(x = step, y = 1000 * event), 
                inherit.aes = FALSE, fill = "red", alpha = 0.2) + 
      coord_cartesian(ylim = c(40, 60)
    

    enter image description here