Search code examples
rggplot2boxplotoutliersgeom-text

Adding labels to outliers for a single boxplots using geom_text() in ggplot2


I have a single boxplot in R of percentage correct values (y-axis), with each point on the plot representing a different participant. I want to label my three outliers with the participant ID (Pt_ID). I created a data frame that includes a column $outlier to label these.

#Create function to identify outliers in terms of % correct
findoutlier <- function(x) {
  return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x))
}

#Add a column to identify which participants are outliers
performance_tibble <- performance_tibble %>%
        mutate(outlier = ifelse(findoutlier(performance_tibble$Perc_Correct), Pt_ID, NA))

#Plot boxplot of %correct including outliers labelled with Pt_ID
ggplot(performance_tibble)+geom_boxplot(aes(y=Perc_Correct), outlier.colour= "red")+theme(axis.text.x = element_blank(), axis.ticks.x= element_blank())

I have looked at other posts and have tried using +geom_text(aes(label=outlier), but this states that I need x and y aesthetics (and I only have a y variable as it is a single boxplot). Can anyone suggest how the labelling of these outliers can be achieved without needing to specify an x aesthetic?

I have added an image of the boxplot with the outliers in red.


Solution

  • You'll need to add a dummy value for the x value, and it's easier to move the y value into ggplot() so that it is used by all the layers. The only other change is to get rid of the x label that then appears. That gives (with some dummy data):

    findoutlier <- function(x) {
      return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x))
    }
    
    #Add a column to identify which participants are outliers
    set.seed(0)
    performance_tibble <- tibble(Perc_Correct = -rlnorm(30), Pt_ID=sample(1:3, 30, TRUE))
    
    performance_tibble <- performance_tibble %>%
      mutate(outlier = ifelse(findoutlier(performance_tibble$Perc_Correct), Pt_ID, NA))
    
    #Plot boxplot of %correct including outliers labelled with Pt_ID
    ggplot(performance_tibble, aes(y=Perc_Correct, x=1)) + geom_boxplot(outlier.colour= "red")+
      geom_text(aes(label=outlier), nudge_x=0.01) +
      theme(axis.text.x = element_blank(), 
            axis.ticks.x= element_blank(),
            axis.title.x = element_blank())
    

    Output plot with point labels