I'm trying to generate multiple plots with a variable number of geom_line
, based on a list that another package generates grouping the column names of my original data. Both my data and the list are huge, so I tried to solve this with a nested loop. It kinda worked but I'm struggling to find a better way to asign a unique color to each geom_line
without randomizing the color. I'm new to coding, so it might not be clear, I apologize in advance.
This is the first thing I tried:
# required package
library(tidyverse)
# sample data
mydata <- data.frame('A' = c(rnorm(n = 10)),
'B' = c(rnorm(n = 10)),
'C' = c(rnorm(n = 10)),
'D' = c(rnorm(n = 10)),
'E' = c(rnorm(n = 10)),
'F' = c(rnorm(n = 10)),
'G' = c(rnorm(n = 10)),
'time' = c(1:10))
# list generated with another package
mylist <- list(c('A','B'), c('C'), c('D', 'E', 'F'), c('G'))
# Line plots for each group from mylist, adding a geom_line for each element in each group
for (i in 1:4) {
p <- ggplot(data = mydata, aes(x = time))
for (x in mylist[[i]]) {
p <- p +
geom_line(aes(y = .data[[x]]))
}
p <- p +
labs(title = paste(mylist[[i]], collapse = ', '),
x = 'time', y = 'value')
print(p)
}
I needed every geom_line
to have a different color, so I tried to use cl <- colors()
and evaluate it randomly, like this:
cl <- colors()
for (i in 1:4) {
p <- ggplot(data = mydata, aes(x = time))
for (x in mylist[[i]]) {
p <- p +
geom_line(aes(y = .data[[x]],
color = cl[sample(1:500, 1)])) # here
}
p <- p +
labs(title = paste(mylist[[i]], collapse = ', '),
x = 'time', y = 'value')
print(p)
}
But the plots don't show the right labels, so I tried to solve this by using scale_color_hue
, as I read here.
cl <- colors()
for (i in 1:4) {
p <- ggplot(data = mydata, aes(x = time, color = 'green')) # I need to specify a color here for sclae_color_hue to work
for (x in mylist[[i]]) {
p <- p +
geom_line(aes(y = .data[[x]],
color = cl[sample(1:500, 1)]))
}
p <- p +
scale_color_hue(name = 'Variable',
labels = mylist[[i]]) +
labs(title = paste(mylist[[i]], collapse = ', '),
x = 'time', y = 'value')
print(p)
}
And it worked! but it feels clunky, specially the random color part, what if it evaluates the same color for two variables in the same group? So I tried to use a variable to evaluate cl
with each iteration of the inner loop:
cl <- colors()
t <- 0 #variable used to evaluate cl
for (i in 1:4) {
p <- ggplot(data = mydata, aes(x = time, color = 'green'))
for (x in mylist[[i]]) {
t <- t + 1 #it goes up by a unit each iteration of this loop
p <- p +
geom_line(aes(y = .data[[x]],
color = cl[t]))
}
p <- p +
scale_color_hue(name = 'Variable',
labels = mylist[[i]]) +
labs(title = paste(mylist[[i]], collapse = ', '),
x = 'time', y = 'value')
print(p)
}
Now every geom_line
is the same color, and only the first variable shows in the lateral box. Is there a better solution? If possible, I would also like to know why it only works when I use sample()
. Thank you.
Edit:
It worked perfectly as stefan suggested. I didn't know much about lapply
so I was confused, but it seems to be a great solution, I'll try to use it more. Also, I relocated pal_color
inside plot_fun
for it to work with any similar list; I hope it's useful:
plot_fun <- function(cols) {
pal_color <- scales::hue_pal()(sum(lengths(cols)))
names(pal_color) <- unlist(cols)
ggplot(data = mydata, aes(x = time)) +
lapply(cols, function(x) {
geom_line(aes(y = .data[[x]], color = x))
}) +
scale_color_manual(values = pal_color) +
labs(
title = paste(cols, collapse = ", "),
x = "time", y = "value"
)
}
lapply(mylist, plot_fun)
One option would be to create a named color vector which assigns a unique color to each column you are going to plot. This palette can then be applied via scale_color_manual
.
Additionally I have refactored your code using lapply
instead of for
loops and using a custom plotting function.
library(ggplot2)
set.seed(123)
pal_color <- scales::hue_pal()(7)
names(pal_color) <- LETTERS[seq(7)]
plot_fun <- function(cols) {
ggplot(data = mydata, aes(x = time)) +
lapply(cols, function(x) {
geom_line(aes(y = .data[[x]], color = x))
}) +
scale_color_manual(values = pal_color) +
labs(
title = paste(cols, collapse = ", "),
x = "time", y = "value"
)
}
lapply(mylist, plot_fun)
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]
#>
#> [[4]]