Search code examples
rggplot2ploteval

ggplots stored in plot list to respect variable values at time of plot generation within for loop


I have an elaborate plot routine that generates box plots with additional layers of scatter and adds them to a plot list.

The routine generates correct plots if they are created during the for loop directly via print(current_plot_complete).

However, if they are added to a plot list during the for loop which is printed only at the end, then the plots are incorrect: the final indices are used to generate all plots (instead of the current index at the time the plot is generated). This seems to be default ggplot2 behavior and I am looking for a solution to circumvent it in the current use case.

The issue seems to be within y = eval(parse(text=(paste0(COL_i)))) where the global environment is used (and thus the final index value) instead of the current values at the time of loop execution.

I tried various approaches to make eval() use the correct variable values, e.g. local(…) or specifying the environment – but without success.

A very simplified MWE is provided below.

enter image description here

MWE

The original routine is much more elaborate than this MWE such that the for loop can not be replaced easily with members of the apply family.

# create some random data
data_temp <- data.frame(
"a" = sample(x = 1:100, size  = 50),
"b" = rnorm(n = 50, mean = 45, sd = 1),
"c" = sample(x = 20:70, size  = 50), 
"d" = rnorm(n = 50, mean = 40, sd = 15),
"e" = rnorm(n = 50, mean = 50, sd = 10),
"f" = rnorm(n = 50, mean = 45, sd = 1),
"g" = sample(x = 20:70, size  = 50)
)
COLs_current <- c("a", "b", "c", "d", "e") # define COLs of data to include in box plots
choice_COLs <- c("a", "d")      # define COLs of data to add scatter to

plot_list <- list(NA)
plot_index <- 1

for (COL_i in choice_COLs) {

  COL_i_index <- which(COL_i == COLs_current)

  # Generate "basis boxplot" (to plot scatterplot on top)
  boxplot_scores <- data_temp %>% 
    gather(COL, score, all_of(COLs_current)) %>%
    ggplot(aes(x = COL, y = score)) +
    geom_boxplot() 

  # Get relevant data of COL_i for scattering: data of 4th quartile
  quartile_values <- quantile(data_temp[[COL_i]])
  threshold <- quartile_values["75%"]           # threshold = 3. quartile value
  data_temp_filtered <- data_temp %>%
    filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
    dplyr::select(COLs_current)                 

  # Create layer of scatter for 4th quartile of COL_i
  scatter_COL_i <- geom_point(data=data_temp_filtered, mapping = aes(x = COL_i_index, y = eval(parse(text=(paste0(COL_i))))), color= "orange")

  # add geom objects to create final plot for COL_i
  current_plot_complete <- boxplot_scores + scatter_COL_i 

  print(current_plot_complete)

  plot_list[[plot_index]] <- current_plot_complete 
  plot_index <- plot_index + 1
}

plot_list

Solution

  • I propose this solution which doesn't tell you why it doesn't work like you do :

    l <- lapply(choice_COLs, temporary_function)
    
    temporary_function <- function(COL_i){
        COL_i_index <- which(COL_i == COLs_current)
    
        # Generate "basis boxplot" (to plot scatterplot on top)
        boxplot_scores <- data_temp %>% 
            gather(COL, score, all_of(COLs_current)) %>%
            ggplot(aes(x = COL, y = score)) +
            geom_boxplot() 
    
        # Get relevant data of COL_i for scattering: data of 4th quartile
        quartile_values <- quantile(data_temp[[COL_i]])
        threshold <- quartile_values["75%"]           # threshold = 3. quartile value
        data_temp_filtered <- data_temp %>%
            filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
            dplyr::select(COLs_current)                 
    
        # Create layer of scatter for 4th quartile of COL_i
        scatter <- geom_point(data=data_temp_filtered,
                              mapping = aes(x = COL_i_index,
                                            y = eval(parse(text=(paste0(COL_i))))),
                              color= "orange")
    
        # add geom objects to create final plot for COL_i
        current_plot_complete <-  boxplot_scores + scatter
    
        return(current_plot_complete)
        }
    

    When you use lapply you don't have such a problem. It is inspired by this post