Search code examples
rggplot2dplyrhistogramdata-visualization

ggplot draw multiple plots by levels of a variable


I have a sample dataset

d=data.frame(n=rep(c(1,1,1,1,1,1,2,2,2,3),2),group=rep(c("A","B"),each=20),stringsAsFactors = F)

And I want to draw two separate histograms based on group variable.

I tried this method suggested by @jenesaisquoi in a separate post here Generating Multiple Plots in ggplot by Factor

ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)+facet_wrap(~group)

Histogram output

It did the trick but if you look closely, the proportions are wrong. It didn't calculate the proportion for each group but rather a grand proportion. I want the proportion to be 0.6 for number 1 for each group, not 0.3.

Then I tried dplyr package, and it didn't even create two graphs. It ignored the group_by command. Except the proportion is right this time.

d%>%group_by(group)%>%ggplot(data=.)+geom_histogram(aes(x=n,y=..count../sum(..count..)),binwidth = 1)

dplyr output

Finally I tried factoring with color

ggplot(data=d)+geom_histogram(aes(x=n,y=..count../sum(..count..),color=group),binwidth = 1)

But the result is far from ideal. I was going to accept one output but with the bins side by side, not on top of each other.

color=group output

In conclusion, I want to draw two separate histograms with correct proportions calculated within each group. If there is no easy way to do this, I can live with one graph but having the bins side by side, and with correct proportions for each group. In this example, number 1 should have 0.6 as its proportion.


Solution

  • By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion

    ggplot(data=d) + geom_histogram(aes(x=n, y=..density..), binwidth = 1) + facet_wrap(~group)