Search code examples
rggplot2density-plot

Iterate over filtered values of a list


I have a dataframe:

set.seed(1)
d <- data.frame(year= c(2001:2005,2001:2005,2001:2005),
                income = sample(2000:10000,15,replace = T),
                gender = sample(1:2,15,replace = T),
                education = sample(1:3,15,replace = T)
)

Since in the actual dataframe, I have more variables than just gener and education, I want to write a function to plot income kernel densities for each subgroup vs. all under gender and education, and save as pdf for each subgroup at the end. Take gender==1 as an example:

male <- d %>% filter(gender == 1)

density_all <- density(d$income)
density_male <- density(male$income)

d_densisty <- data.frame(x = density_all$x, 
                      density_all = density_all$y, 
                      density_male = density_male$y)

plot <- ggplot(d_densisty, aes(x)) + 
  geom_line(aes(y = density_all), color = "red") +
  geom_line(aes(y = density_male), color = "blue")

ggsave("subgroup_name.pdf", plot, width = 300, height = 250, units = "mm")  

I have thought about converting the dataframe from long to wide format, but the length of each subgroup won't be the same. I also thought about doing a loop within a loop, i.e. first looping over the values in a variable (gender == 1 or 2), then looping over the variables (gender and education). Not sure which option is better and how exactly I can carry it out.

Your suggestions will be highly appreciated.


Solution

  • This will store all the plots in a list of lists, just amend the initial vector for the variables you wish to cover:

    plots <- map(.x = c("gender", "education"),
                 .f = \(categ){
                   map(.x = unique(d[[categ]]),
                       .f = \(lev){
                         plt <- ggplot(d) +
                           geom_density(aes(x = income, colour = "red")) +
                           geom_density(data = d[d[categ]== lev,], aes(x = income, colour = "blue"))
                       })
                 })
    

    You can use walk() on the lists to apply your save-to-pdf function, but I assume you'll want to play with the aesthetics, set titles, hide the legend etc first.

    EDIT: Version allowing for plotting of the difference as well:

    plots <- map(.x = c("gender", "education"),
                 .f = \(categ){
                   tmpdf <- data.frame(x = density(d$income)$x, 
                                       y = density(d$income)$y)
                   map(.x = unique(d[[categ]]),
                       .f = \(lev){
                         tmpdf$y2 <- density(d[d[[categ]] == lev,"income"])$y
                         tmpdf$y3 <- tmpdf$y - tmpdf$y2
                         plt <- ggplot(cbind(tmpdf, density(d[d[[categ]] == lev,"income"])$y)) +
                           geom_line(aes(x = x, y = y, colour = "total")) +
                           geom_line(aes(x = x, y = y2, colour = "filtered")) +
                           geom_line(aes(x = x, y = y3, colour = "difference")) +
                           scale_color_manual(name = categ, values = c("total" = "red",
                                                                       "filtered" = "blue",
                                                                       "difference" = "black"))
                       })
                 })