Search code examples
rggplot2facet-gridcowplot

Incorrect colour gradient when using cowplot to patch together plots


Say I have a data set with x and y values that are grouped according to two variables: grp is a, b, or c, while subgrp is E, F, or G.

  • a has y values in [0, 1]
  • b has y values in [10, 11]
  • c has y values in [100, 101].

I'd like to plot y against x with the colour of the point defined by y for all grp and subgrp combinations. Since each grp has very different y values, I can't just use facet_grid alone, as the colour scales would be useless. So, I plot each grp with its own scale then patch them together with plot_grid from cowplot. I also want to use a three-point gradient specified by scale_colour_gradient2. My code looks like this:

# Set RNG seed
set.seed(42)

# Toy data frame
df <- data.frame(x = runif(270), y = runif(270) + rep(c(0, 10, 100), each = 90),
                 grp = rep(letters[1:3], each = 90), subgrp = rep(LETTERS[4:6], 90))

head(df)
#>           x         y grp subgrp
#> 1 0.9148060 0.1362958   a      D
#> 2 0.9370754 0.7853494   a      E
#> 3 0.2861395 0.4533034   a      F
#> 4 0.8304476 0.1357424   a      D
#> 5 0.6417455 0.8852210   a      E
#> 6 0.5190959 0.3367135   a      F

# Load libraries
library(cowplot)
library(ggplot2)
library(dplyr)

# Plotting list
g_list <- list()

# Loop through groups 'grp'
for(i in levels(df$grp)){
  # Subset the data
  df_subset <- df %>% filter(grp == i)
  
  # Calculate the midpoint
  mp <- mean(df_subset$y)
  
  # Print midpoint
  message("Midpoint: ", mp)
  
  g <- ggplot(df_subset) + geom_point(aes(x = x, y = y, colour = y))
  g <- g + facet_grid(. ~ subgrp) + ggtitle(i)
  g <- g + scale_colour_gradient2(low = "blue", high = "red", mid = "yellow", midpoint = mp)
  g_list[[i]] <- g
}
#> Midpoint: 0.460748857570191
#> Midpoint: 10.4696476330981
#> Midpoint: 100.471083269571

plot_grid(plotlist = g_list, ncol = 1)

Created on 2019-04-17 by the reprex package (v0.2.1)

In this code, I specify the midpoint of the colour gradient as the mean of y for each grp. I print this and verify that it is correct. It is.

My question: why are my colour scales incorrect for the first two plots?

It appears the same range is applied to each grp despite subsetting the data. If I replace for(i in levels(df$grp)){ with for(i in levels(df$grp)[1]){, the colour scale is correct for the single plot that is produced.


Update

Okay, this is weird. Inserting ggplot_build(g)$data[[1]]$colour immediately before g_list[[i]] <- g solves the problem. But, why?

enter image description here


Solution

  • Long story short, you're creating unevaluated promises and then evaluate them at a time when the original data is gone. This problem is generally avoided if you use proper functional programming style rather than procedural code. I.e., define a function that does the work and then use an apply function for the loop.

    set.seed(42)
    
    # Toy data frame
    df <- data.frame(x = runif(270), y = runif(270) + rep(c(0, 10, 100), each = 90),
                     grp = rep(letters[1:3], each = 90), subgrp = rep(LETTERS[4:6], 90))
    
    library(cowplot)
    library(ggplot2)
    library(dplyr)
    
    # Loop through groups 'grp'
    g_list <- lapply(
      levels(df$grp), 
      function(i) {
        # Subset the data
        df_subset <- df %>% filter(grp == i)
    
        # Calculate the midpoint
        mp <- mean(df_subset$y)
    
        # Print midpoint
        message("Midpoint: ", mp)
    
        g <- ggplot(df_subset) + geom_point(aes(x = x, y = y, colour = y))
        g <- g + facet_grid(. ~ subgrp) + ggtitle(i)
        g <- g + scale_colour_gradient2(low = "blue", high = "red", mid = "yellow", midpoint = mp)
        g
      }
    )
    #> Midpoint: 0.460748857570191
    #> Midpoint: 10.4696476330981
    #> Midpoint: 100.471083269571
    
    plot_grid(plotlist = g_list, ncol = 1)
    

    Created on 2019-04-17 by the reprex package (v0.2.1)