Search code examples
rggplot2plottidyrboxplot

How do I rename the labels of my individual boxplots in R?


I have created a boxplot for groups in ggplot. Therefore I had to reshape the data frame from a wide into a long format. Now I want to change the labels or names of my individual plots in my graph. I used the function xlim(labels= c("A", "B"), which changes the labels, but gives me error messages and the plot is no longer calculated/displayed.

    data3 %>% 
      select(Treatment_A, Treatment_B) %>%
      pivot_longer(cols = everything(), names_to = "Variable",values_to = "Value") %>%
      ggplot(aes(x= Variable, y = Value, fill = Variable)) +
      stat_boxplot(geom= "errorbar", width= 0.5) +
      geom_boxplot(fill=c("skyblue2", "gold2"))+
      labs(x="Treatment", y="Frequency (total)", title="Treatment comparison")+
      theme_minimal()+
      theme(axis.title.x = element_text(size =10))+
      theme(axis.title.y = element_text(size =10))+
      theme(plot.title=element_text(size = 16))+
      theme(plot.title = element_text(hjust = 0.5))+
      stat_summary(fun = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
               width = .75,lwd = 0.7, linetype = "dashed", col = "red") +
      xlim(labels= c("A", "B"))+ # <- This gives me the error
      ylim(0,30)

Warning messages:

1: Removed 2730 rows containing missing values (stat_boxplot). 
2: Removed 2730 rows containing missing values (stat_boxplot). 
3: Removed 2730 rows containing non-finite values (stat_summary). 
4: In max(f) : no non-missing arguments to max; returning -Inf
5: Computation failed in `stat_summary()`:
argument must be coercible to non-negative integer

I am still very new to R, need the graphic for my master thesis and am grateful for any help. Many thanks in advance!


Solution

  • The issue is that xlim is a shorthand for setting the limits and using xlim(labels= c("A", "B")) you are setting the limits of the x scale to be equal c("A", "B"), i.e. only observations with an x value of "A" or "B" are included in your plot. All other observations get dropped and that's what the warning are telling to.

    To assign labels use the labels argument of scale_x_discrete instead.

    Using some fake random example data:

    library(tidyverse)
    
    set.seed(123)
    
    data3 <- data.frame(
      Treatment_A = runif(100, 0, 30),
      Treatment_B = runif(100, 0, 30)
    )
    
    p <- data3 %>%
      select(Treatment_A, Treatment_B) %>%
      pivot_longer(cols = everything(), names_to = "Variable", values_to = "Value") %>%
      ggplot(aes(x = Variable, y = Value, fill = Variable)) +
      stat_boxplot(geom = "errorbar", width = 0.5) +
      geom_boxplot(fill = c("skyblue2", "gold2")) +
      labs(x = "Treatment", y = "Frequency (total)", title = "Treatment comparison") +
      theme_minimal() +
      theme(axis.title.x = element_text(size = 10)) +
      theme(axis.title.y = element_text(size = 10)) +
      theme(plot.title = element_text(size = 16)) +
      theme(plot.title = element_text(hjust = 0.5)) +
      stat_summary(
        fun = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
        width = .75, lwd = 0.7, linetype = "dashed", col = "red"
      ) +
      ylim(0, 30)
    

    First, to reproduce your issue by adding xlim(labels = c("A", "B")):

    p +
      xlim(labels = c("A", "B"))
    #> Warning: The dot-dot notation (`..y..`) was deprecated in ggplot2 3.4.0.
    #> ℹ Please use `after_stat(y)` instead.
    #> This warning is displayed once every 8 hours.
    #> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
    #> generated.
    #> Warning: Removed 200 rows containing missing values (`stat_boxplot()`).
    #> Warning in min(x): no non-missing arguments to min; returning Inf
    #> Warning in max(x): no non-missing arguments to max; returning -Inf
    #> Warning in min(diff(sort(x))): no non-missing arguments to min; returning Inf
    #> Warning: Removed 200 rows containing missing values (`stat_boxplot()`).
    #> Warning: Removed 200 rows containing non-finite values (`stat_summary()`).
    #> Warning in max(f): no non-missing arguments to max; returning -Inf
    #> Warning: Computation failed in `stat_summary()`
    #> Caused by error in `seq_len()`:
    #> ! argument must be coercible to non-negative integer
    

    Using scale_x_discrete instead gives the desired result:

    p +
      scale_x_discrete(labels = c("A", "B"))