I have a dataframe that contains 130 rows (the person ids) and 169 columns(gene names). I have been able to create a boxplot for each one using the following code For example my dataframe looks like this with the gene expression for each gene per person:
| ID| gene X | gene Y | gene Z |
| A | 0.50 | 0.78 | 0.86 |
| B | 0.45 | 0.52 | 0.94 |
| C | 0.48 | 0.53 | 0.05 |
lapply(seq_along(tpose_genexp), function(x){
boxplot(tpose_genexp[[x]],
horizontal = FALSE, # Horizontal or vertical plot
lwd = 2, # Lines width
col = rgb(1, 0, 0, alpha = 0.4), # Color
main = paste("", colnames(tpose_genexp))[[x]],
notch = TRUE,
border = "black",
outpch = 25, # Outliers symbol
outbg = "green", # Outliers color
whiskcol = "blue", # Whisker color
whisklty = 2, # Whisker line type
lty = 1,
outl) # Line type (box and median)
})
This has given me a 169 boxplots. I am trying to figure out how to label the outliers with the ids which are the row names.
I tried using lapply and a few other options I found but I couldn't seem to get any of them to work.
Here is a way to plot the outliers' row names.
I will use data set mpg
in package ggplot2
because some numeric columns have outliers and one of them does not. The data subsetting code prior to the lapply
loop is meant to make the code reproducible.
data(mpg, package = "ggplot2")
i_num <- which(sapply(mpg, is.numeric))
str(mpg[i_num])
#> Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 5 variables:
#> $ displ: num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
#> $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
#> $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
#> $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
#> $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
i_num <- i_num[c(1, 4, 5)]
lapply(names(mpg[i_num]), \(x) {
bp <- boxplot(mpg[[x]],
horizontal = FALSE, # Horizontal or vertical plot
lwd = 2, # Lines width
col = rgb(1, 0, 0, alpha = 0.4), # Color
main = x,
notch = TRUE,
border = "black",
outpch = 25, # Outliers symbol
outbg = "green", # Outliers color
whiskcol = "blue", # Whisker color
whisklty = 2, # Whisker line type
lty = 1 # Line type (box and median)
#, outl # ??? (it's in the question's code)
)
i_row <- which(mpg[[x]] %in% bp$out)
labs <- if(length(i_row)) {
tapply(row.names(mpg)[i_row], bp$out, paste, collapse = ", ")
} else ""
text(1.1, unique(bp$out), labels = labs, pos = 4)
})
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> [[3]]
#> NULL
Created on 2023-02-25 with reprex v2.0.2