Consider this simple example
library(dplyr)
library(forcats)
library(ggplot2)
mydata <- data_frame(cat1 = c(1,1,2,2),
cat2 = c('a','b','a','b'),
value = c(10,20,-10,-20),
time = c(1,2,1,2))
mydata <- mydata %>% mutate(cat1 = factor(cat1),
cat2 = factor(cat2))
> mydata
# A tibble: 4 x 4
cat1 cat2 value time
<fct> <fct> <dbl> <dbl>
1 1 a 10.0 1.00
2 1 b 20.0 2.00
3 2 a -10.0 1.00
4 2 b -20.0 2.00
Now, I want to create a chart where I interact the two factor variables.
I know I can use interact
in ggplot2
(see below).
My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual
.
For instance:
ggplot(mydata,
aes(x = time, y = value, col = interaction(cat1, cat2) )) +
geom_point(size=15) + theme(legend.position="bottom")+
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme(legend.position="bottom",
legend.text=element_text(size=12, face = "bold")) +
scale_colour_manual(name = ""
, values=c("red","red4","royalblue","royalblue4")
, labels=c("1-b","1-a"
,"2-a","2-b"))
shows:
which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual()
. Indeed, the bright red dot is 1-a
and not 1-b
(note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.
Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats
? Perhaps creating the labels as strings in the dataframe beforehand?
Thanks!
If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv()
, rather than assign them manually.
The colour cheatsheet here summarise the HSV colour model rather nicely:
Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].
Here's how I would adapt it for this use case:
mydata2 <- mydata %>%
# use "-" instead of the default "." since we are using that for the labels anyway
mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
# cat1: assign hue evenly across the whole wheel,
# cat2: restrict both saturation & value to the [0.3, 1], as it can look too
# faint / dark otherwise
mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
> colour.vector
1-a 1-b 2-a 2-b
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000"
With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:
ggplot(mydata2,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector)) +
theme(legend.position = "bottom")
An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE)
in the colour scale:
mydata3 <- data.frame(
cat1 = factor(rep(1:3, times = 5)),
cat2 = rep(LETTERS[1:5], each = 3),
value = 1:15,
time = 15:1
) %>%
mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
ggplot(mydata3,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector),
guide = guide_legend(byrow = TRUE)) +
theme(legend.position = "bottom")