Search code examples
rggplot2p-value

horizontal p-val geom_point plot


I'm new in R and I'm struggling with some plotting in ggplot.

I have some monthly data I simply plotted as points connected with lines.

  ggplot(data=df, aes(x=x,y=y)) + 
  geom_line(aes(group=g)) + geom_point() 

enter image description here

Now, I'd like to add pairwise results of Wilcoxon tests between the three categories grouped. It should look like this. enter image description here

I'm a bit confused, I know stat_pvalue_manual works with categories, but I have a continuous y axis. and it should be horizontal.

Maybe there are more functions to do this. does anyone have an example of how this could be done?

Thanks in advance.

structure(list(x = c("April", "April", "April", "May", "May", "May", "June", "June", "June", "July", "July", "July", "August", "August", "August", "September", "September", "September", "October", "October", "October", "November", "November", "November", "December", "December", "December", "January", "January", "January", "February", "February", "February"), g = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c"), y = c(4.748, 5.3388, 5.7433, 4.744, 5.4938, 6.1583, 4.767, 5.6, 6.2067, 4.889, 5.8363, 6.295, 4.887, 5.6413, 6.15, 4.94, 5.73, 6.1833, 4.974, 5.2113, 5.77, 5.022, 5.47, 5.9117, 4.964, 5.3425, 5.7217, 4.95, 5.15, 5.9833, 4.75, 5.425, 5.7833)), class = "data.frame", row.names = c(NA, -33L))


Solution

  • There's a few things that make this fiddly, the main ones being that you have a discrete scale for your x-axis, and stat_pvalue_manual seems to only work with continuous scales, and a coordinate swap is needed. As a result the factor needs to be ordered, and changed from geom_line to geom_path, and the means for each factor level need to be calculated and added into the stat_test object. This results in:

    #Test data
    df <- structure(list(x = c("April", "April", "April", "May", "May", "May", "June", "June", "June", "July", "July", "July", "August", "August", "August", "September", "September", "September", "October", "October", "October", "November", "November", "November", "December", "December", "December", "January", "January", "January", "February", "February", "February"), g = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c", "a", "b", "c"), y = c(4.748, 5.3388, 5.7433, 4.744, 5.4938, 6.1583, 4.767, 5.6, 6.2067, 4.889, 5.8363, 6.295, 4.887, 5.6413, 6.15, 4.94, 5.73, 6.1833, 4.974, 5.2113, 5.77, 5.022, 5.47, 5.9117, 4.964, 5.3425, 5.7217, 4.95, 5.15, 5.9833, 4.75, 5.425, 5.7833)), class = "data.frame", row.names = c(NA, -33L))
    df$x <- factor(df$x, levels=unique(df$x))
    
    stat.test <- compare_means(
      y ~ g, data = df
    )
    
    #Calculate mean values by group
    means <- aggregate(df$y, list(g=df$g), mean)
    means2 <- means$x
    names(means2) <- means$g
    
    stat.test$group1 <- means2[stat.test$group1]
    stat.test$group2 <- means2[stat.test$group2]
    stat.test$y.position = c(13, 13.5, 13)  #arbitrary location for plotting brackets
    
    #Modify the plot
    ggplot(data=df, aes(x=y,y=as.numeric(x))) + 
      geom_path(aes(group=g)) + 
      geom_point() + 
      stat_pvalue_manual(stat.test, coord.flip = TRUE) + coord_flip() + 
      scale_y_continuous("Month", labels=levels(df$x), 
                         breaks=seq_along(levels(df$x)), minor_breaks = 1)
    

    Output plot