Search code examples
rfor-loopggplot2label

Multiple plots with variable geoms and different colors for each one using a loop in ggplot2 without randomizing


I'm trying to generate multiple plots with a variable number of geom_line, based on a list that another package generates grouping the column names of my original data. Both my data and the list are huge, so I tried to solve this with a nested loop. It kinda worked but I'm struggling to find a better way to asign a unique color to each geom_line without randomizing the color. I'm new to coding, so it might not be clear, I apologize in advance.

This is the first thing I tried:

# required package
library(tidyverse)

# sample data
mydata <- data.frame('A' = c(rnorm(n = 10)), 
                     'B' = c(rnorm(n = 10)), 
                     'C' = c(rnorm(n = 10)), 
                     'D' = c(rnorm(n = 10)), 
                     'E' = c(rnorm(n = 10)), 
                     'F' = c(rnorm(n = 10)),
                     'G' = c(rnorm(n = 10)),
                     'time' = c(1:10))

# list generated with another package
mylist <- list(c('A','B'), c('C'), c('D', 'E', 'F'), c('G'))

# Line plots for each group from mylist, adding a geom_line for each element in each group
for (i in 1:4) {
  p <- ggplot(data = mydata, aes(x = time))
  for (x in mylist[[i]]) {
    p <- p +
      geom_line(aes(y = .data[[x]]))
  }
  p <- p +
    labs(title = paste(mylist[[i]], collapse = ', '),
         x = 'time', y = 'value')
  print(p)
}

I needed every geom_line to have a different color, so I tried to use cl <- colors() and evaluate it randomly, like this:

cl <- colors() 

for (i in 1:4) {
  p <- ggplot(data = mydata, aes(x = time))
  for (x in mylist[[i]]) {
    p <- p +
      geom_line(aes(y = .data[[x]], 
                    color = cl[sample(1:500, 1)])) # here
  }
  p <- p +
    labs(title = paste(mylist[[i]], collapse = ', '),
         x = 'time', y = 'value')
  print(p)
}

But the plots don't show the right labels, so I tried to solve this by using scale_color_hue, as I read here.

cl <- colors()

for (i in 1:4) {
  p <- ggplot(data = mydata, aes(x = time, color = 'green')) # I need to specify a color here for sclae_color_hue to work 
  for (x in mylist[[i]]) {
    p <- p +
      geom_line(aes(y = .data[[x]], 
                    color = cl[sample(1:500, 1)]))
  }
  p <- p +
    scale_color_hue(name = 'Variable',
                    labels = mylist[[i]]) +
    labs(title = paste(mylist[[i]], collapse = ', '),
         x = 'time', y = 'value')
  print(p)
}

And it worked! but it feels clunky, specially the random color part, what if it evaluates the same color for two variables in the same group? So I tried to use a variable to evaluate cl with each iteration of the inner loop:

cl <- colors()

t <- 0 #variable used to evaluate cl 

for (i in 1:4) {
  p <- ggplot(data = mydata, aes(x = time, color = 'green'))
  for (x in mylist[[i]]) {
    t <- t + 1 #it goes up by a unit each iteration of this loop
    p <- p +
      geom_line(aes(y = .data[[x]],
                    color = cl[t]))
  }
  p <- p +
    scale_color_hue(name = 'Variable',
                    labels = mylist[[i]]) +
    labs(title = paste(mylist[[i]], collapse = ', '),
         x = 'time', y = 'value')
  print(p)
}

Now every geom_line is the same color, and only the first variable shows in the lateral box. Is there a better solution? If possible, I would also like to know why it only works when I use sample(). Thank you.

Edit: It worked perfectly as stefan suggested. I didn't know much about lapply so I was confused, but it seems to be a great solution, I'll try to use it more. Also, I relocated pal_color inside plot_fun for it to work with any similar list; I hope it's useful:

plot_fun <- function(cols) {
  pal_color <- scales::hue_pal()(sum(lengths(cols)))
  names(pal_color) <- unlist(cols)
  ggplot(data = mydata, aes(x = time)) +
    lapply(cols, function(x) {
      geom_line(aes(y = .data[[x]], color = x))
    }) +
    scale_color_manual(values = pal_color) +
    labs(
      title = paste(cols, collapse = ", "),
      x = "time", y = "value"
    )
}

lapply(mylist, plot_fun)

Solution

  • One option would be to create a named color vector which assigns a unique color to each column you are going to plot. This palette can then be applied via scale_color_manual.

    Additionally I have refactored your code using lapply instead of for loops and using a custom plotting function.

    library(ggplot2)
    
    set.seed(123)
    
    pal_color <- scales::hue_pal()(7)
    names(pal_color) <- LETTERS[seq(7)]
    
    plot_fun <- function(cols) {
      ggplot(data = mydata, aes(x = time)) +
        lapply(cols, function(x) {
          geom_line(aes(y = .data[[x]], color = x))
        }) +
        scale_color_manual(values = pal_color) +
        labs(
          title = paste(cols, collapse = ", "),
          x = "time", y = "value"
        )
    }
    
    lapply(mylist, plot_fun)
    #> [[1]]
    

    #> 
    #> [[2]]
    

    #> 
    #> [[3]]
    

    #> 
    #> [[4]]