Search code examples
rggplot2histogramfacet

Draw histograms with two grouping variables


I'm trying to draw histograms for data with two grouping variables, each of which has two levels, using ggplot2. I want to set one of the grouping variables to fill (and group) and the other to facet.
And I want y axis to show the percentage relative to each fill as well as each facet.

My first idea was to summarize the data and use geom_bar like:

df <- tibble(
  x=round(rnorm(1:1000)*5, 0),
  fill=rep(c("a", "b"), 500),
  facet=c(rep("x", 500), rep("y", 500))
)
df %>% group_by(fill, facet, x) %>% summarize(n=n()) %>% mutate(n=n/sum(n)) %>%
  ggplot(aes(x=x, y=n, group=fill, fill=fill)) +
  geom_bar(stat="identity", position="dodge2") +
  facet_wrap(~ facet)

which produced this graph.

enter image description here

However, since it is somehow troublesome to change the bin size in this case, I would like to use geom_histogram.

Then I found this question: How to plot faceted histogram (not bar charts) with percents relative to each facet?
and came up with the following code:

df %>% ggplot(aes(
  x=x,
  y=stat(count/tapply(count, list(fill, PANEL), sum)[fill, PANEL]),
  group=fill,
  fill=fill
  )) + geom_histogram(binwidth=1, position="dodge2") + facet_wrap(~ facet)

But I got an error: Error in unit(x, default.units) : 'x' and 'units' must have length > 0.

Is there any good ways to fix the problem?
Thank you for your help in advance!


Solution

  • Changing [fill, PANEL] to [PANEL] in third line gave me the expected output.

    df %>% ggplot(aes(
      x=x,
      y=stat(count/tapply(count, list(fill, PANEL), sum)[PANEL]),
      group=fill,
      fill=fill
      )) + geom_histogram(binwidth=1, position="dodge2") + facet_wrap(~ facet)