Search code examples
rloopsfor-loopggplot2

using for loop to generate multiple lines on a single ggplot


I realize that many people will have an issue with using a for loop to generate this plot rather than the preferred R method of melting the data, but for the sake of how close I already am, please humor me.

I have multiple data sets, which I would like to loop over and generate a plot representing a column of all of them. I've accomplished this so far:


singleden <- function (){
  
  
  line_list <- vector("list", length(paths))
  for (i in (1:length(paths))) {
    # dirname <- dirname(paths[i])
    # len <- nchar(dirname)
    # corename <- (substr(dirname, 97, len))
    
    line_list[[i]] <- geom_line(data = datas[[i]], aes(x = year, y = density, group =1),  
                                stat="identity", color = color[i]) 
  }
  
  label_list <- vector("list", length(paths))
  for (i in (1:length(paths))) { 
    name <- basename(dirname(paths[i]))
    # directname <- dirname(paths[i])
    # #print(paths[i])
    # #print(directname)
    # 
    # name <- (substr(directname, 97, len))
    
    label_list[[i]] <- geom_label_repel(data = datas[[i]] %>% filter (year == min(year)),
                                        aes(label = name,  y = density, x = year), color = color[i])
    
    
  }

   ggplot() + line_list + label_list
  
}


The lines plot correctly, and all is well

enter image description here

However, adding the label_list generates labels with only the last value of the list. Somehow my two lists seem to be misaligned when plotting them with ggplot. The list itself stores the correct number and values I would expect when printed

enter image description here

I've tried using a single for loop for both the lines and the labels, encountered the same issue. Wondering if I need to pair the lists together somehow, but unclear how I could do that


Solution

  • The issue is due to lazy evaluation and there are a bunch of questions where people run into the same issue when using for loops. See e.g. "for" loop only adds the final ggplot layer for an explanation of the issue.

    In your case this issue can most likely be avoided by moving label = name outside of aes().

    Using a minimal reproducible example based on the gapminder dataset:

    library(gapminder)
    library(ggplot2)
    library(ggrepel)
    library(dplyr, warn=FALSE)
    
    set.seed(123)
    
    datas <- gapminder |>
      filter(country %in% sample(levels(gapminder$country), 5)) |>
      rename(density = lifeExp) |>
      split(~country, drop = TRUE)
    
    color <- gapminder::country_colors[names(datas)]
    
    paths <- file.path(names(datas), names(datas))
    
    singleden <- function() {
      line_list <- vector("list", length(paths))
      for (i in (1:length(paths))) {
        line_list[[i]] <- geom_line(
          data = datas[[i]], aes(x = year, y = density, group = 1),
          stat = "identity", color = color[i]
        )
      }
    
      label_list <- vector("list", length(paths))
      for (i in (1:length(paths))) {
        name <- basename(dirname(paths[i]))
    
        label_list[[i]] <- geom_label_repel(
          data = datas[[i]] %>% filter(year == min(year)),
          aes(y = density, x = year), color = color[i], label = name
        )
      }
    
      ggplot() + line_list + label_list
    }
    
    singleden()
    

    However, in general I would recommend to use lapply to create a list of plots or layers via a "loop" and which in general avoids the issue:

    singleden2 <- function() {
      line_label_list <- lapply(
        seq_along(paths), \(i) {
          name <- basename(dirname(paths[i]))
    
          list(
            geom_line(
              data = datas[[i]], aes(x = year, y = density, group = 1),
              stat = "identity", color = color[i]
            ),
            geom_label_repel(
              data = datas[[i]] %>% filter(year == min(year)),
              aes(label = name, y = density, x = year), color = color[i]
            )
          )
        }
      )
    
      ggplot() + line_label_list
    }
    
    singleden2()
    

    But from my understanding of what you are trying to achieve, my preferred option would be to bind your list of datasets into one and use just one geom_line and geom_label_repel for plotting:

    names(datas) <- names(color) <- basename(dirname(paths))
    
    datas |>
      bind_rows(.id = "name") |>
      ggplot(aes(x = year, y = density, group = name, color = name)) +
      geom_line() +
      geom_label_repel(
        data = ~ filter(., year == min(year), .by = name),
        aes(label = name), direction = "y", show.legend = FALSE
      ) +
      scale_color_manual(values = color, guide = "none")