I realize that many people will have an issue with using a for loop to generate this plot rather than the preferred R method of melting the data, but for the sake of how close I already am, please humor me.
I have multiple data sets, which I would like to loop over and generate a plot representing a column of all of them. I've accomplished this so far:
singleden <- function (){
line_list <- vector("list", length(paths))
for (i in (1:length(paths))) {
# dirname <- dirname(paths[i])
# len <- nchar(dirname)
# corename <- (substr(dirname, 97, len))
line_list[[i]] <- geom_line(data = datas[[i]], aes(x = year, y = density, group =1),
stat="identity", color = color[i])
}
label_list <- vector("list", length(paths))
for (i in (1:length(paths))) {
name <- basename(dirname(paths[i]))
# directname <- dirname(paths[i])
# #print(paths[i])
# #print(directname)
#
# name <- (substr(directname, 97, len))
label_list[[i]] <- geom_label_repel(data = datas[[i]] %>% filter (year == min(year)),
aes(label = name, y = density, x = year), color = color[i])
}
ggplot() + line_list + label_list
}
The lines plot correctly, and all is well
However, adding the label_list generates labels with only the last value of the list. Somehow my two lists seem to be misaligned when plotting them with ggplot. The list itself stores the correct number and values I would expect when printed
I've tried using a single for loop for both the lines and the labels, encountered the same issue. Wondering if I need to pair the lists together somehow, but unclear how I could do that
The issue is due to lazy evaluation and there are a bunch of questions where people run into the same issue when using for
loops. See e.g. "for" loop only adds the final ggplot layer for an explanation of the issue.
In your case this issue can most likely be avoided by moving label = name
outside of aes()
.
Using a minimal reproducible example based on the gapminder
dataset:
library(gapminder)
library(ggplot2)
library(ggrepel)
library(dplyr, warn=FALSE)
set.seed(123)
datas <- gapminder |>
filter(country %in% sample(levels(gapminder$country), 5)) |>
rename(density = lifeExp) |>
split(~country, drop = TRUE)
color <- gapminder::country_colors[names(datas)]
paths <- file.path(names(datas), names(datas))
singleden <- function() {
line_list <- vector("list", length(paths))
for (i in (1:length(paths))) {
line_list[[i]] <- geom_line(
data = datas[[i]], aes(x = year, y = density, group = 1),
stat = "identity", color = color[i]
)
}
label_list <- vector("list", length(paths))
for (i in (1:length(paths))) {
name <- basename(dirname(paths[i]))
label_list[[i]] <- geom_label_repel(
data = datas[[i]] %>% filter(year == min(year)),
aes(y = density, x = year), color = color[i], label = name
)
}
ggplot() + line_list + label_list
}
singleden()
However, in general I would recommend to use lapply
to create a list
of plots or layers via a "loop" and which in general avoids the issue:
singleden2 <- function() {
line_label_list <- lapply(
seq_along(paths), \(i) {
name <- basename(dirname(paths[i]))
list(
geom_line(
data = datas[[i]], aes(x = year, y = density, group = 1),
stat = "identity", color = color[i]
),
geom_label_repel(
data = datas[[i]] %>% filter(year == min(year)),
aes(label = name, y = density, x = year), color = color[i]
)
)
}
)
ggplot() + line_label_list
}
singleden2()
But from my understanding of what you are trying to achieve, my preferred option would be to bind your list of datasets into one and use just one geom_line
and geom_label_repel
for plotting:
names(datas) <- names(color) <- basename(dirname(paths))
datas |>
bind_rows(.id = "name") |>
ggplot(aes(x = year, y = density, group = name, color = name)) +
geom_line() +
geom_label_repel(
data = ~ filter(., year == min(year), .by = name),
aes(label = name), direction = "y", show.legend = FALSE
) +
scale_color_manual(values = color, guide = "none")