Morning,
was trying to create a for-loop to iterate over dataframe and create Boxplots for numerical variables. Unfortunalty, I got stuck with the iteration.
The code below will show what I have so far. I was planning to safe the plots in a list and later in a second loop to plot them all at once in .Rmd-file.
The thing is, I have two dynamical values in the iteration. First the name of the plot variable should be in the form of plt_x, where x stands for the number of column in the Dataframe. The second is the Title of each Boxplot where the column-name should get pasted.
The boxplots without the Loop work perfectly fine and the creating on plot_name aswell, but for some reasons the for-loop returns all kind of errors.
Can someone help? I may have a logical error with the plot_safe-variable, but after all that thinking I can't figure what it's.
plot_safe = list()
for (col in names(data)) {
if (is.numeric(data[[col]])) {
max_val = max(data[[col]])
min_val = min(data[[col]])
median_val = median(data[[col]])
iqr_val = IQR(data[[col]])
plot_name = paste("plt_", grep(col, names(data)), sep = "")
plot_safe[[plot_name]] =
ggplot(data, aes(x =NA,
y = data[[col]])) +
stat_boxplot(geom = "errorbar", color = "grey20") +
geom_boxplot() +
stat_summary(fun.y = mean, geom = "point", colour = "red") +
scale_color_brewer(palette = "Dark2", guide = FALSE) +
labs(title = sprintf("Boxplot of the Variable: %s", col)) +
theme_bw() +
annotate("text", x = 0.5, y = min_val, label = min_val, color = "grey50", size = 3) +
annotate("text", x = 0.5, y = max_val, label = max_val, color = "grey50", size = 3) +
annotate("text", x = 0.5, y = median_val, label = median_val, color = "grey50", size = 3) +
annotate("text", x = 0.5, y = median_val - iqr_val/2, label = median_val - iqr_val/2, color = "grey50", size = 3) +
annotate("text", x = 0.5, y = median_val + iqr_val/2, label = median_val + iqr_val/2, color = "grey50", size = 3) +
theme(axis.ticks.x = element_blank(), axis.title.x = element_blank(), axis.text.x = element_blank())
}
}
for (plot in plot_safe) {
plot_safe[sprintf("%s",plot)]
}
Especially when it comes to creating a list of ggplot
s you could achieve your result more easily using lapply
instead of a for
loop. I would also suggest to put your plotting code in a function which makes testing and debugging easier. Finally, I simplified your code a bit by putting the boxplot stats in a data frame so that we can add the labels using just one geom_text
.
Note: Note the use of the .data
pro-noun in aes()
which is the recommended way to map column names passed as character strings on aesthetics.
Using iris
as example data:
library(ggplot2)
data <- iris
plot_fun <- function(col) {
if (is.numeric(data[[col]])) {
box_stats <- data.frame(
stats = c("min", "p25", "median", "p75", "max"),
value = boxplot.stats(data[[col]])[["stats"]]
)
ggplot(data, aes(
x = NA,
y = .data[[col]]
)) +
stat_boxplot(geom = "errorbar", color = "grey20") +
geom_boxplot() +
stat_summary(fun = mean, geom = "point", colour = "red") +
scale_color_brewer(palette = "Dark2", guide = "none") +
labs(
title = sprintf("Boxplot of the Variable: %s", col)
) +
theme_bw() +
geom_text(
data = box_stats,
aes(x = .5, y = value, label = value),
color = "grey50", size = 3
) +
theme(
axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank()
)
}
}
plot_safe <- lapply(
names(data),
plot_fun
)
names(plot_safe) <- paste("plt", names(data), sep = "_")
plot_safe$plt_Sepal.Length