Search code examples
rggplot2plotgraphsummary

Adding Summary Statistics to a graph using the annotation feature : ggplot2


I am interested in making a similar plot like this for iris data, with summary statistics produced on the plot: https://i.sstatic.net/Zyv2s.jpg

I am following this post over here: How to add summary statistics in histogram plot using ggplot2?

df <- iris
df.m <- melt(df, id="Species")

#Calculating the summary statistics
summ <- df.m %>% 
  group_by(variable) %>% 
  summarize(min = min(value), max = max(value), 
            mean = mean(value), q1= quantile(value, probs = 0.25), 
            median = median(value), q3= quantile(value, probs = 0.75),
            sd = sd(value))

I then modified the code to make density plots instead of histograms:

p1 <- ggplot(df.m) + geom_density(aes(x = value), fill = "grey", color = "black") + 
    facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()

I seem to be having a problem over here:

p1+geom_density(data=summ,label =split(summ,summ$variable),
npcx = 0.00, npcy = 1, hjust = 0, vjust = 1,size=2)

Does anyone know what the problem is? Also, is it possible to accomplish this with only ggplot2? I am working with a computer where I do not have the admin privileges to download many libraries (I have reshape2, dplyr, ggplot2). Should this be done using the annotate() function in ggplot2? And is there a way to change the x-axis for each graph to "log"?


Solution

  • I would suggest next approach as you have only few packages. You can add summary as a text annotation but you should play around the position of the text for each groups. Also log() transformation is possible if you apply in the aes() for ggplot(). I will show you two ways to do the annotations.

    library(ggplot2)
    library(dplyr)
    
    #Data
    df <- iris
    df.m <- melt(df, id="Species")
    

    Here, we create the annotations:

    #Calculating the summary statistics and create the label
    summ <- df.m %>% 
      group_by(variable) %>% 
      summarize(min = min(value), max = max(value), 
                mean = mean(value), q1= quantile(value, probs = 0.25), 
                median = median(value), q3= quantile(value, probs = 0.75),
                sd = sd(value)) %>%
      mutate_if(is.numeric, round, digits=2) %>%
      mutate(lab = paste("min = ", min, "\nmax = ", max, "\nmean = ", mean, 
                        "\nq1 = ", q1, "\nmedian = ", median, "\nq3 = ", q3, "\nsd = ", sd),
             position=c(1.5, 0.8, 0.25, -2)) %>% select(variable, lab, position)
    

    If you want to define the position of the labels you have to modify position variable from previous section which determines x position. Using that the code for the plot is next:

    #Plot
    p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") + 
      facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()
    p1 <- p1 + geom_text(data = summ, aes(x=position, label = lab), y=Inf, hjust=1, vjust=1.2, size=3)
    p1
    

    The output:

    enter image description here

    Annotations have the x position defined in summ. If you want to avoid it you simply use next code:

    p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") + 
      facet_wrap(~variable, scales="free", ncol = 2) + theme_bw()
    p1 <- p1 + geom_text(data = summ, aes(label = lab), x = Inf, y = Inf, hjust = 1, vjust = 1.2, size = 3)
    p1
    

    The output:

    enter image description here

    You can choose any of these options. The reason why the function you applied did not work is maybe due to grid and gridExtra packages.