Search code examples
rggplot2labelstackedgeom-bar

ggplot2, stacked histogram, and summary labels


I'm trying to take event data (A, B, C, and D - below) which occur over 4 locations (1, 2, 3, 4 - below). I want to plot them as a stacked bar that is filled in to show the contribution of each event (A,B,C,D) to that location AND I want to show the integer values of those contributions. I would like to see not only the individual values (which below sort of does) but I'd also like to see the total contribution - which I can't figure out how to do.

So there are two problems: 1: Printing not only the individual values of a stacked bar but also (or even, separately / only) print the total value at the top. 2: The text labels get printed at a y offset of their value, so they overwrite each other and don't line up within the bar. I'd prefer them someplace expected inside a sub-bar such as the middle or top.

a <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,2)
b <- c('A','B','C','D','A','A','B','C','B','B','C','C','C','D','D','A','A','B','C','D')
df <- data.frame(a, b)

I want to create a summary of this - so here's table()

table(df$a, df$b)

  A B C D
1 2 2 2 1
2 2 1 1 1
3 0 2 2 0
4 1 0 1 2

Now back to a data.frame for plotting with ggplot:

df2 <- data.frame(table(df$a, df$b))

Then plot it:

library(ggplot2)
ggplot(df2, aes(x=Var1, y=Freq, fill=Var2, label=Freq)) + 
  geom_bar(stat="identity") + 
  geom_text(stat="identity")

I would really appreciate help. Do I not need to wrangle my data frame through a table to summarize it and then back into a data frame? Can I get at the total height of the bar and print that label?

I feel like if I weren't using fill, I could get at the ..count.. value but stat="bin", but since I've gone to stat="identity" I can't seem to get at that summary value.

Thanks!


Solution

  • I would summarize the data like you have in order to produce your desired plot. As for the labels, you need to also create variables that define where your labels should be placed on your graph.

    a <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,2)
    b <- c('A','B','C','D','A','A','B','C','B','B','C','C','C','D','D','A','A','B','C','D')
    df <- data.frame(a, b)
    df2 <- data.frame(table(df$a, df$b))
    

    Now create a variable for the overall count:

    df2$overall <- NA
    df2$overall[1:length(unique(df2$Var1))] <- xtabs(Freq~Var1,data=df2)
    

    Now create a variable for the counts of each bar using the ddply package:

    library(plyr)
    df2 <- ddply(df2, "Var1", transform, cumvars=cumsum(Freq))
    # Remove Zeros from printing on labels
    df2$Freq2 <- ifelse(df2$Freq==0,NA,df2$Freq)
    
    
    library(ggplot2)
    
    ggplot(df2, aes(x=Var1, y=Freq, fill=Var2, label=Freq)) + 
      geom_bar(stat="identity") + 
      geom_text(aes(x=Var1, y=overall, label=overall),vjust=-.2,stat="identity") + 
      geom_text(aes(x=Var1, y=cumvars, label=Freq2),vjust=1.5, colour="white", stat="identity")
    

    You can change the size, colour, position, etc. of the labels to make the graph look nice.