Search code examples
rggplot2ggrepel

Automatic outlier labeling in ggplot


I have used ggplot in a loop to generate scatter plots for each of my 200 variables-V1, V2,etc. To make the scatter plots clearer, I would like to be able to label the outliers, automatically. I want to label the points that are greater than the value of the 95th Percentile for each unique variable.

I tried using the code from here-Label points in geom_point, however, this is more of a manual approach to labeling outliers. I have about 200 variables and cannot specify the values for each of them.

Again, the closest solution I could find was from the link above: county_list[i] is the list of the variables that I'm looping over

    ggplot(nba, aes(x= county_list[i], y= Afd_2017, colour="green", label=Name))+
    geom_point() +
    geom_text(aes(label=ifelse(value_of_V[i]>24,as.character(Name),'')),hjust=0,vjust=0)

What I would like is something like this:

    ggplot(nba, aes(x= county_list[i], y= Afd_2017, colour="green", label=Name))+
    geom_point() +
    geom_text(aes(label=ifelse((value_of_V[i] >greater-than- 
    value-of-the-95-Percentile-of-the- 
    value_of_V[i]),as.character(Name),'')),hjust=0,vjust=0)

Solution

  • You could create a list of plots using lapply/map

    library(ggplot2)
    
    list_plots <- lapply(nba[-1], function(data) 
         ggplot(nba, aes(x= MIN, y = data, colour="green", label=Name))+
         geom_point() +
         geom_text(aes(label= ifelse(data > quantile(data, 0.95),
         as.character(Name),'')),hjust=0,vjust=0))
    

    Then you can access individual plots by subsetting the list using [[

    list_plots[[6]]
    

    enter image description here

    list_plots[[7]]
    

    enter image description here