Say I have a data set with x
and y
values that are grouped according to two variables: grp
is a
, b
, or c
, while subgrp
is E
, F
, or G
.
a
has y
values in [0, 1]b
has y
values in [10, 11]c
has y
values in [100, 101].I'd like to plot y
against x
with the colour of the point defined by y
for all grp
and subgrp
combinations. Since each grp
has very different y
values, I can't just use facet_grid
alone, as the colour scales would be useless. So, I plot each grp
with its own scale then patch them together with plot_grid
from cowplot
. I also want to use a three-point gradient specified by scale_colour_gradient2
. My code looks like this:
# Set RNG seed
set.seed(42)
# Toy data frame
df <- data.frame(x = runif(270), y = runif(270) + rep(c(0, 10, 100), each = 90),
grp = rep(letters[1:3], each = 90), subgrp = rep(LETTERS[4:6], 90))
head(df)
#> x y grp subgrp
#> 1 0.9148060 0.1362958 a D
#> 2 0.9370754 0.7853494 a E
#> 3 0.2861395 0.4533034 a F
#> 4 0.8304476 0.1357424 a D
#> 5 0.6417455 0.8852210 a E
#> 6 0.5190959 0.3367135 a F
# Load libraries
library(cowplot)
library(ggplot2)
library(dplyr)
# Plotting list
g_list <- list()
# Loop through groups 'grp'
for(i in levels(df$grp)){
# Subset the data
df_subset <- df %>% filter(grp == i)
# Calculate the midpoint
mp <- mean(df_subset$y)
# Print midpoint
message("Midpoint: ", mp)
g <- ggplot(df_subset) + geom_point(aes(x = x, y = y, colour = y))
g <- g + facet_grid(. ~ subgrp) + ggtitle(i)
g <- g + scale_colour_gradient2(low = "blue", high = "red", mid = "yellow", midpoint = mp)
g_list[[i]] <- g
}
#> Midpoint: 0.460748857570191
#> Midpoint: 10.4696476330981
#> Midpoint: 100.471083269571
plot_grid(plotlist = g_list, ncol = 1)
Created on 2019-04-17 by the reprex package (v0.2.1)
In this code, I specify the midpoint of the colour gradient as the mean of y
for each grp
. I print this and verify that it is correct. It is.
My question: why are my colour scales incorrect for the first two plots?
It appears the same range is applied to each grp
despite subsetting the data. If I replace for(i in levels(df$grp)){
with for(i in levels(df$grp)[1]){
, the colour scale is correct for the single plot that is produced.
Okay, this is weird. Inserting ggplot_build(g)$data[[1]]$colour
immediately before g_list[[i]] <- g
solves the problem. But, why?
Long story short, you're creating unevaluated promises and then evaluate them at a time when the original data is gone. This problem is generally avoided if you use proper functional programming style rather than procedural code. I.e., define a function that does the work and then use an apply function for the loop.
set.seed(42)
# Toy data frame
df <- data.frame(x = runif(270), y = runif(270) + rep(c(0, 10, 100), each = 90),
grp = rep(letters[1:3], each = 90), subgrp = rep(LETTERS[4:6], 90))
library(cowplot)
library(ggplot2)
library(dplyr)
# Loop through groups 'grp'
g_list <- lapply(
levels(df$grp),
function(i) {
# Subset the data
df_subset <- df %>% filter(grp == i)
# Calculate the midpoint
mp <- mean(df_subset$y)
# Print midpoint
message("Midpoint: ", mp)
g <- ggplot(df_subset) + geom_point(aes(x = x, y = y, colour = y))
g <- g + facet_grid(. ~ subgrp) + ggtitle(i)
g <- g + scale_colour_gradient2(low = "blue", high = "red", mid = "yellow", midpoint = mp)
g
}
)
#> Midpoint: 0.460748857570191
#> Midpoint: 10.4696476330981
#> Midpoint: 100.471083269571
plot_grid(plotlist = g_list, ncol = 1)
Created on 2019-04-17 by the reprex package (v0.2.1)