Search code examples
rggplot2dplyrtidyeval

Call many variables in a for loop with dplyr/ggplot function


Sometimes when performing exploratory analysis or producing reports we want to plot univariate distributions for many variables. I could do this faceting the plot after some tidy trick, but there's ordered factors and I want to keep them ordered on the plots.

So, to accomplish it in a more efficient way, I built a simple dplyr/ggplot based function. I made this example below using the Arthritis dataset of vcd package.

library(dplyr)
library(ggplot2)

data(Arthritis, package = "vcd")

head(Arthritis)

plotUniCat <- function(df, x) {
  x <- enquo(x)
  df %>%
    filter(!is.na(!!x)) %>%
    count(!!x) %>%
    mutate(prop = prop.table(n)) %>%
    ggplot(aes(y=prop, x=!!x)) +
    geom_bar(stat = "identity")
}

plotUniCat(Arthritis, Improved)

I can plot a formatted graph in a very short way, which is cool, but with just one variable.

I tried to call more than one variable with a for loop, but it's not working. The code runs, but nothing happens.

variables <- c("Improved", "Sex", "Treatment")

for (i in variables) {
  plotUniCat(Arthritis, noquote(i))
}

I searched about this, but it's still not clear for me. Does someone know what I am doing wrong or how to make it work?

Thanks in advance.


Solution

  • Change the enquo in the function to sym, to convert the variable string to a symbol. That is,

    plotUniCat <- function(df, x) {
      x <- sym(x)
      df %>%
        filter(!is.na(!!x)) %>%
        count(!!x) %>%
        mutate(prop = prop.table(n)) %>%
        ggplot(aes(y=prop, x=!!x)) +
        geom_bar(stat = "identity")
    }
    

    or, more concisely,

    plotUniCat <- function(df, x) {
      x <- sym(x)
      df %>%
        filter(!is.na(!!x)) %>%
        ggplot(aes(x = as.factor(!!x))) +
        geom_histogram(stat = "count")
    }
    

    and then

    out <- lapply(variables, function(i) plotUniCat(Arthritis,i))
    

    Finally, use grid.arrange to display the plots. E.g.

    library(gridExtra)
    do.call(grid.arrange, c(out, ncol = 2))
    

    enter image description here