Search code examples
rggplot2histogram

ggplot2 histogram showing proportion of group by bin instead of count


Assuming a dataset made of two groups:

dataA<-rnorm(200,3,sd=2)
dataB<-rnorm(500,5,sd=3)
all<-data.frame(dataset=c(rep('A',length(dataA)),rep('B',length(dataB))),value=c(dataA,dataB))

We can plot the histogram with the two groups like this:

ggplot(all,aes(value,fill=dataset))+geom_histogram(bins=50,position='stack')

I would like to obtain the same kind of plot but with the proportion of each group instead of the count for every bin.

I found the following way to do it by calculating the proportion manually for each group:

ggplot(all,aes(x=value,fill=dataset))+geom_histogram(aes(y=c(..count..[..group..==1]/(..count..[..group..==1]+..count..[..group..==2]),..count..[..group..==2]/(..count..[..group..==1]+..count..[..group..==2]))),position='stack',bins=50)+ylab('proportion')

This gives the expected result (below), but it's a very inelegant solution. I'm probably missing something here, is there a better way to obtain the same (or a similar) result?

enter image description here


Solution

  • You might be looking for position = 'fill' instead of 'stack'.

    library(ggplot2)
    set.seed(42)
    
    dataA <- rnorm(200, 3, sd = 2)
    dataB <- rnorm(500, 5, sd = 3)
    
    all <- data.frame(
      dataset = c(rep('A', length(dataA)),rep('B', length(dataB))),
      value   = c(dataA, dataB)
    )
    
    ggplot(all, aes(value, fill = dataset)) +
      geom_histogram(bins = 50, position = 'fill')
    #> Warning: Removed 14 rows containing missing values (geom_bar).
    

    Created on 2022-01-15 by the reprex package (v2.0.1)