Any suggestions on how to solve problem? Unlike other similar questions here on the channel that presented the solution for a variable as a factor, my case is different. I would like to see the labels of the outliers for multiple variables.
I have the following chart as below. for example:
It was created with this command:
z_mtcars <-data.frame(scale(mtcars[-12]))
z_mtcars$type<-rownames(mtcars)
z_mtcars %>% melt(id.vars = "type") %>%
ggplot() +
aes( x = variable, y = value, fill = as.numeric(variable)) +
geom_boxplot() +
scale_fill_distiller(palette = "Blues") +
scale_alpha(range = c(1,1)) +
ggtitle("Boxplot: Standardized Score (Z-Scale) ") +
xlab("Variables") +
ylab("Value") +
labs(fill = "Order of \nVariables") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90,hjust = 1)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
geom_hline(yintercept = 1, linetype = "dotted", color = "blue") +
theme(legend.position = "left")
Here is what I tried. I simplified your code a bit to highlight the point you are asking. You want to somehow find label information of the outliers. You can identify outliers using the borrowed function below. When you identify them, you add car names in a new column called outlier. You use this information in geom_text_repel()
in the ggrepel package.
library(tidyverse)
library(ggrepel)
z_mtcars <- data.frame(scale(mtcars[-12]))
z_mtcars$type <- rownames(mtcars)
I borrowed this function from this question. Credit goes to JasonAizkalns.
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
z_mtcars %>%
pivot_longer(names_to = "variable", values_to = "value", -type) %>%
group_by(variable) %>%
mutate(outlier = if_else(is_outlier(value), type, NA_character_)) %>%
ggplot(aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
geom_text_repel(aes(label = outlier), na.rm = TRUE, show.legend = F)