Search code examples
rggplot2facetfacet-grid

ggplot2 split color histograms according to data: facet_grid


Building on this question:

Is there a way to create a grid of histograms where the bins are different colors above vs. below arbitrary values (without overlapping bins), without needing to refer to the environment outside of ggplot()? I can do this with a single histogram, like this (using median for illustration purposes):

set.seed(123)

value = stats::rnorm(100, mean = 0, sd = 1)

df = data.frame(value)

df %>%
  {
    ggplot(data = ., aes(x = value, fill = ifelse(value > median(value), "0", "1"))) +
      geom_histogram(boundary = median(.$value), alpha = 0.5, position = "identity") +
      theme(legend.position = "none")
  }

enter image description here

Can this be done for faceted plots, where each plot uses a different value, according to a grouping variable? E.g. this doesn't work:

set.seed(456)

value = stats::rnorm(200, mean = 0, sd = 1)
group = c(rep(1,100), rep(2,100))
    
df = data.frame(value, group)

df %>%
  dplyr::mutate(value = ifelse(group == 2, value + 1, value)) %>%
  dplyr::group_by(group) %>%
  dplyr::mutate(above_median = value > median(value)) %>%
  {
    ggplot(data = ., aes(x = value, fill = above_median)) +
      facet_grid(rows = group) +
      geom_histogram(boundary = median(.$value), alpha = 0.5, position = "identity") +
      theme(legend.position = "none")
  }

enter image description here


Solution

  • One option would be to add you histograms using multiple geom_histogram layers, i.e. split your data by group, then use lapply to add a geom_histogram for each group:

    library(dplyr, warn=FALSE)
    library(ggplot2)
    
    df %>%
      dplyr::mutate(value = ifelse(group == 2, value + 1, value)) %>%
      dplyr::group_by(group) %>%
      dplyr::mutate(above_median = value > median(value)) %>%
      {
        ggplot(data = ., aes(x = value, fill = above_median)) +
          facet_grid(rows = vars(group)) +
          lapply(split(., .$group), function(x) {
            geom_histogram(data = x, boundary = median(x$value), alpha = 0.5, position = "identity")
          }) +
          theme(legend.position = "none")
      }
    #> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    #> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.