Search code examples
rlabellapplyboxplotoutliers

labeling box-plot outliers in R


I have a dataframe that contains 130 rows (the person ids) and 169 columns(gene names). I have been able to create a boxplot for each one using the following code For example my dataframe looks like this with the gene expression for each gene per person:

| ID|  gene X  |  gene Y  |  gene Z |
| A |   0.50   |   0.78   |   0.86  |
| B |   0.45   |   0.52   |   0.94  |
| C |   0.48   |   0.53   |   0.05  |
lapply(seq_along(tpose_genexp), function(x){
 boxplot(tpose_genexp[[x]], 
         horizontal = FALSE, # Horizontal or vertical plot
         lwd = 2, # Lines width
         col = rgb(1, 0, 0, alpha = 0.4), # Color
         main = paste("", colnames(tpose_genexp))[[x]],
         notch = TRUE, 
         border = "black",
         outpch = 25,       # Outliers symbol
         outbg = "green",   # Outliers color
         whiskcol = "blue", # Whisker color
         whisklty = 2,      # Whisker line type
         lty = 1,
         outl) # Line type (box and median)
})

This has given me a 169 boxplots. I am trying to figure out how to label the outliers with the ids which are the row names.

enter image description here

I tried using lapply and a few other options I found but I couldn't seem to get any of them to work.


Solution

  • Here is a way to plot the outliers' row names.
    I will use data set mpg in package ggplot2 because some numeric columns have outliers and one of them does not. The data subsetting code prior to the lapply loop is meant to make the code reproducible.

    data(mpg, package = "ggplot2")
    i_num <- which(sapply(mpg, is.numeric))
    str(mpg[i_num])
    #> Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  5 variables:
    #>  $ displ: num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
    #>  $ year : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
    #>  $ cyl  : int  4 4 4 4 6 6 6 4 4 4 ...
    #>  $ cty  : int  18 21 20 21 16 18 18 18 16 20 ...
    #>  $ hwy  : int  29 29 31 30 26 26 27 26 25 28 ...
    i_num <- i_num[c(1, 4, 5)]
    
    lapply(names(mpg[i_num]), \(x) {
      bp <- boxplot(mpg[[x]], 
                    horizontal = FALSE, # Horizontal or vertical plot
                    lwd = 2, # Lines width
                    col = rgb(1, 0, 0, alpha = 0.4), # Color
                    main = x,
                    notch = TRUE, 
                    border = "black",
                    outpch = 25,       # Outliers symbol
                    outbg = "green",   # Outliers color
                    whiskcol = "blue", # Whisker color
                    whisklty = 2,      # Whisker line type
                    lty = 1            # Line type (box and median)
                    #, outl            # ??? (it's in the question's code)
      )
      i_row <- which(mpg[[x]] %in% bp$out)
      labs <- if(length(i_row)) {
        tapply(row.names(mpg)[i_row], bp$out, paste, collapse = ", ")
      } else ""
      text(1.1, unique(bp$out), labels = labs, pos = 4)
    })
    

    #> [[1]]
    #> NULL
    #> 
    #> [[2]]
    #> NULL
    #> 
    #> [[3]]
    #> NULL
    

    Created on 2023-02-25 with reprex v2.0.2