Search code examples
rggplot2histogram

How to label stacked histogram in ggplot


I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.

ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")

enter image description here

This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?


Solution

  • The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.

    The following codes builds a binned frequency table for the dataframe:

    # Subset data
    mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
    melt(table(mpg_df[, c("displ", "class")]))
    
    # Bin Data
    breaks <- 1
    cuts <- seq(0.5, 8, breaks)
    mpg_df$bin <- .bincode(mpg_df$displ, cuts)
    
    # Count the data
    mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
    names(mpg_df) <- c("class", "bin", "Freq")
    

    You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:

    ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
      geom_bar(stat = "identity", colour = "black", width = 1) +
      geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
       position=position_stack(vjust=0.5), colour="black")
    

    enter image description here

    I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:

    ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
      geom_bar(stat = "identity", colour = "black", width = 1) +
      geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
       position=position_stack(vjust=0.5), colour="black")
    

    enter image description here

    Update

    I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!

    ggplot(mpg, aes(x = displ, fill = class, label = class)) +
      geom_histogram(binwidth = 1,col="black") +
      stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
    

    This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)

    This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.