Search code examples
rggplot2geom

Background bars in ggplot2 using geom_rect


I have daily flow data in a dataset I've called "dat1_na".

It spans between ~1940 and ~2020 so there's 18,780 lines in this dataset.

str(dat1_na) is:

'data.frame':   18780 obs. of  9 variables:
 ...
 $ MLd    : num  96 34 34 20 34 34 52 34 34 26 ...
 $ Date   : Date, format: "1943-09-19" "1943-09-07" "1943-09-08" "1943-09-11" ...
 ...
 $ Climate: chr  "Dry" "Dry" "Dry" "Dry" ...

So it's a simple Line graph (hydrograph) showing MLd (the daily flow rate) against time which is no problem. However, I'm trying to shade the background using geom_rect according to the 'Climate' part of the dataset which only has 2 possible values: "Dry" and "Wet". The issue is that I can't get the background to show up properly. I know the data is being read right because if I tweak my code a bit I can see the dry years and wet years where they should be:

ggplot(dat1_na, aes(x=Date, y=MLd, xmin=Date, xmax=Date, ymin=0, ymax=6000)) + 
  geom_line(colour = "#231EDC") + 
  geom_rect(aes(colour=Climate), alpha=0.2) +
  theme_minimal() 

graph using aes(colour=Climate)

What I really want is it to be transparent and sit behind the line graph. I can't seem to get it working though. I've tried a few versions of code including things in the ggplot() statement, or the aes() statement, but nothing really works. I have code which I think should work, but nothing from the geom_rect shows up (except in the legend which looks correct).

ggplot(dat1_na, aes(x=Date, y=MLd, xmin=Date, xmax=Date, ymin=0, ymax=6000)) + 
  geom_line(colour = "#231EDC") + 
  geom_rect(aes(fill=Climate), linetype=0, alpha=0.2) +
  theme_minimal() 

graph using aes(fill=Climate)

I'm wondering if its to do with the number of rows in my data (~18,000) causing the geom_rect to be just too small and for only the outline being large enough to show up. The trouble with that is I can't get the outline to be transparent. I assume the code is drawing a rectangle for each row, either pink or green depending on the value of dat1_na$Climate.

Does anyone have any suggestions?

Cheers


Solution

  • It's difficult to demonstrate without a reproducible example, so let's create one with the same column names and types as your own data:

    set.seed(8)
    
    dat1_na <- data.frame(MLd = 40 + cumsum(sample(seq(-5, 5), 100, TRUE)),
                          Date = sort(as.Date(sample(seq(-9601, 9601), 100, TRUE),
                                              origin = '1970-01-01')),
                          Climate = c('Dry', 'Wet')[round(1.5 + 
                                      cumsum(runif(100, -0.01, 0.01)))])
    dat1_na
    

    The key here is to create an little data frame for the rectangles, based on the start and end dates of changes in Climate

    library(tidyverse)
    
    rect_frame <- dat1_na %>%
      arrange(Date) %>%
      mutate(change = lag(Climate) != Climate, 
             change = c(TRUE, change[-c(1, nrow(.))], TRUE)) %>%
      filter(change) %>%
      mutate(End_Date = lead(Date))
    

    Now, when we plot, ensure that we draw the rect layer first. It should be filled by the fill aesthetic rather than the color aesthetic, and the layer needs to be passed rect_frame as its data argument:

    ggplot(dat1_na, aes(x = Date, y = MLd)) + 
      geom_rect(data = rect_frame, 
                aes(fill = Climate, xmin = Date, xmax = End_Date, 
                    ymin = -Inf, ymax = Inf), alpha = 0.2) +
      geom_line(colour = "#231EDC") + 
      theme_minimal()
    

    enter image description here