Search code examples
rggplot2

Computation failed with using geom_signif in a geom_col


I'm trying to plot a dodged cols with two subgroups: first are the "REGION" group, then the "name" subgroup that involves a "Pre" and "Post intervention. I've tried ggsignif using geom_signif(comparisons = list(c("Pre_Ambos", "Post_Ambos")), but it shows me "Warning message: Computation failed in stat_signif() Caused by error in if (scales$x$map(comp[1]) == data$group[1] | manual) ...: ! missing value where TRUE/FALSE needed. Here is my code:

# Build dataframe 
REGION <- c("Arica", "Tarapacá", "Antofagasta", "Atacama", "Coquimbo", "Valparaíso", "Metropolitana", "O'Higgins", "Maule", "Ñuble", "Bíobío", "Araucanía", "Los Ríos", "Los Lagos", "Aysén", "Magallanes", "Chile")
Pre_Ambos <- c(11.33, 9.96, 10.24, 14.17, 13.43, 12.96, 11.47, 14.54, 14.58, 18.00, 12.19, 15.34, 16.10, 17.64, 16.34, 15.04, 13.96)
Post_Ambos <- c(8.54, 7.60, 7.86, 10.44, 10.01, 11.97, 9.45, 13.07, 13.76, 11.56, 10.37, 14.48, 13.14, 15.04, 14.74, 12.07, 11.51)
Dif_Ambos_Porc <- c(24.61, 23.74, 23.20, 26.30, 25.49, 7.67, 17.59, 10.10, 5.67, 35.80, 14.94, 5.60, 18.37, 14.74, 9.78, 19.76, 17.57)

Table1 <- data.frame(REGION, Pre_Ambos, Post_Ambos,  Dif_Ambos_Porc)
View(Table1)


#Pivoting
Table1 |> 
  select(REGION, Pre_Ambos, Post_Ambos, Dif_Ambos_Porc) |> 
  pivot_longer(cols = c("Pre_Ambos", "Post_Ambos"))  |> 
  mutate(name = forcats::fct_relevel(name, c("Pre_Ambos", "Post_Ambos"))) -> Table1_p 


#Ploting 
ggplot(Table1_p, aes(x = REGION, y = value, fill = name)) +
  geom_col(position = "dodge") +
  scale_x_discrete(limits = Table1_p$REGION)

Here is the graphic

Plot without significance

I'm only interested in the significance between "Pre_Ambos" and "Post_Ambos" in each REGION. Thanks a lot


Solution

  • Your expected output doesn't make sense to me: in your example you have a single 'Pre-Ambos' and a single 'Post_Ambos' value for each region, so you're trying to do a statistical test (t-test or wilcox test) comparing two values. You can do this, e.g.

    library(tidyverse)
    library(ggsignif)
    
    # Build dataframe 
    REGION <- c("Arica", "Tarapacá", "Antofagasta", "Atacama", "Coquimbo", "Valparaíso", "Metropolitana", "O'Higgins", "Maule", "Ñuble", "Bíobío", "Araucanía", "Los Ríos", "Los Lagos", "Aysén", "Magallanes", "Chile")
    Pre_Ambos <- c(11.33, 9.96, 10.24, 14.17, 13.43, 12.96, 11.47, 14.54, 14.58, 18.00, 12.19, 15.34, 16.10, 17.64, 16.34, 15.04, 13.96)
    Post_Ambos <- c(8.54, 7.60, 7.86, 10.44, 10.01, 11.97, 9.45, 13.07, 13.76, 11.56, 10.37, 14.48, 13.14, 15.04, 14.74, 12.07, 11.51)
    Dif_Ambos_Porc <- c(24.61, 23.74, 23.20, 26.30, 25.49, 7.67, 17.59, 10.10, 5.67, 35.80, 14.94, 5.60, 18.37, 14.74, 9.78, 19.76, 17.57)
    
    Table1 <- data.frame(REGION, Pre_Ambos, Post_Ambos,  Dif_Ambos_Porc)
    
    #Pivoting
    Table1 |>
      select(REGION, Pre_Ambos, Post_Ambos, Dif_Ambos_Porc) |> 
      pivot_longer(cols = c("Pre_Ambos", "Post_Ambos"))  |> 
      mutate(name = forcats::fct_relevel(name, c("Pre_Ambos", "Post_Ambos"))) -> Table1_p 
    
    
    #Ploting 
    ggplot(Table1_p, aes(x = name, y = value, fill = name)) +
      geom_col(position = "dodge") +
      #scale_x_discrete(limits = REGION) +
      geom_signif(comparisons = list(c("Pre_Ambos", "Post_Ambos"))) +
      facet_wrap(~REGION, nrow = 1)
    

    Created on 2024-05-15 with reprex v2.1.0

    However, the p-values for each region are for each comparison (i.e. ~10 vs ~7.5 for the first region, p = ~1). This is fine if you have multiple values for each region, but if you want to conduct a test for all Pre_Ambos vs all Post_Ambos you need to change your figure, e.g.

    ggplot(Table1_p, aes(x = name, y = value, fill = REGION)) +
      geom_col(position = "dodge") +
      geom_signif(comparisons = list(c("Pre_Ambos", "Post_Ambos")))
    #> Warning in wilcox.test.default(c(11.33, 9.96, 10.24, 14.17, 13.43, 12.96, :
    #> cannot compute exact p-value with ties
    

    Created on 2024-05-15 with reprex v2.1.0

    Does that make sense?