Search code examples
rfor-loopggplot2boxplot

Dynamic iteration through dataframe to create customized boxplots with R


Morning,

was trying to create a for-loop to iterate over dataframe and create Boxplots for numerical variables. Unfortunalty, I got stuck with the iteration.
The code below will show what I have so far. I was planning to safe the plots in a list and later in a second loop to plot them all at once in .Rmd-file.

The thing is, I have two dynamical values in the iteration. First the name of the plot variable should be in the form of plt_x, where x stands for the number of column in the Dataframe. The second is the Title of each Boxplot where the column-name should get pasted.

The boxplots without the Loop work perfectly fine and the creating on plot_name aswell, but for some reasons the for-loop returns all kind of errors.

Can someone help? I may have a logical error with the plot_safe-variable, but after all that thinking I can't figure what it's.

plot_safe = list()  
for (col in names(data)) {
  if (is.numeric(data[[col]])) { 
    
    max_val = max(data[[col]])
    min_val = min(data[[col]])
    median_val = median(data[[col]])
    iqr_val = IQR(data[[col]])     
    
    
    plot_name = paste("plt_", grep(col, names(data)), sep = "")
    plot_safe[[plot_name]] = 
      ggplot(data, aes(x =NA,
                       y = data[[col]])) +
      stat_boxplot(geom = "errorbar", color = "grey20") +
      geom_boxplot() +
      stat_summary(fun.y = mean, geom = "point", colour = "red") +
      scale_color_brewer(palette = "Dark2", guide = FALSE) +
      labs(title = sprintf("Boxplot of the Variable: %s", col)) +
      theme_bw() +
      annotate("text", x = 0.5, y = min_val, label = min_val, color = "grey50", size = 3) +
      annotate("text", x = 0.5, y = max_val, label = max_val, color = "grey50", size = 3) +
      annotate("text", x = 0.5, y = median_val, label = median_val, color = "grey50", size = 3) +
      annotate("text", x = 0.5, y = median_val - iqr_val/2, label = median_val - iqr_val/2, color = "grey50", size = 3) +
      annotate("text", x = 0.5, y = median_val + iqr_val/2, label = median_val + iqr_val/2, color = "grey50", size = 3) +
      theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(), axis.text.x = element_blank())

  }
}

for (plot in plot_safe) {
  plot_safe[sprintf("%s",plot)]
}

Solution

  • Especially when it comes to creating a list of ggplots you could achieve your result more easily using lapply instead of a for loop. I would also suggest to put your plotting code in a function which makes testing and debugging easier. Finally, I simplified your code a bit by putting the boxplot stats in a data frame so that we can add the labels using just one geom_text.

    Note: Note the use of the .data pro-noun in aes() which is the recommended way to map column names passed as character strings on aesthetics.

    Using iris as example data:

    library(ggplot2)
    
    data <- iris
    
    plot_fun <- function(col) {
      if (is.numeric(data[[col]])) {
        box_stats <- data.frame(
          stats = c("min", "p25", "median", "p75", "max"),
          value = boxplot.stats(data[[col]])[["stats"]]
        )
    
        ggplot(data, aes(
          x = NA,
          y = .data[[col]]
        )) +
          stat_boxplot(geom = "errorbar", color = "grey20") +
          geom_boxplot() +
          stat_summary(fun = mean, geom = "point", colour = "red") +
          scale_color_brewer(palette = "Dark2", guide = "none") +
          labs(
            title = sprintf("Boxplot of the Variable: %s", col)
          ) +
          theme_bw() +
          geom_text(
            data = box_stats,
            aes(x = .5, y = value, label = value),
            color = "grey50", size = 3
          ) +
          theme(
            axis.ticks.x = element_blank(),
            axis.title.x = element_blank(),
            axis.text.x = element_blank()
          )
      }
    }
    plot_safe <- lapply(
      names(data),
      plot_fun
    )
    
    names(plot_safe) <- paste("plt", names(data), sep = "_")
    
    plot_safe$plt_Sepal.Length