Sometimes when performing exploratory analysis or producing reports we want to plot univariate distributions for many variables. I could do this faceting the plot after some tidy trick, but there's ordered factors and I want to keep them ordered on the plots.
So, to accomplish it in a more efficient way, I built a simple dplyr
/ggplot
based function. I made this example below using the Arthritis dataset of vcd
package.
library(dplyr)
library(ggplot2)
data(Arthritis, package = "vcd")
head(Arthritis)
plotUniCat <- function(df, x) {
x <- enquo(x)
df %>%
filter(!is.na(!!x)) %>%
count(!!x) %>%
mutate(prop = prop.table(n)) %>%
ggplot(aes(y=prop, x=!!x)) +
geom_bar(stat = "identity")
}
plotUniCat(Arthritis, Improved)
I can plot a formatted graph in a very short way, which is cool, but with just one variable.
I tried to call more than one variable with a for loop, but it's not working. The code runs, but nothing happens.
variables <- c("Improved", "Sex", "Treatment")
for (i in variables) {
plotUniCat(Arthritis, noquote(i))
}
I searched about this, but it's still not clear for me. Does someone know what I am doing wrong or how to make it work?
Thanks in advance.
Change the enquo
in the function to sym
, to convert the variable string to a symbol. That is,
plotUniCat <- function(df, x) {
x <- sym(x)
df %>%
filter(!is.na(!!x)) %>%
count(!!x) %>%
mutate(prop = prop.table(n)) %>%
ggplot(aes(y=prop, x=!!x)) +
geom_bar(stat = "identity")
}
or, more concisely,
plotUniCat <- function(df, x) {
x <- sym(x)
df %>%
filter(!is.na(!!x)) %>%
ggplot(aes(x = as.factor(!!x))) +
geom_histogram(stat = "count")
}
and then
out <- lapply(variables, function(i) plotUniCat(Arthritis,i))
Finally, use grid.arrange
to display the plots. E.g.
library(gridExtra)
do.call(grid.arrange, c(out, ncol = 2))