I have a data frame with multiple columns. Here is an example.
my_df <- data.frame(x = 1:5, y = c(50, 22, 15, 33, 49))
colnames(my_df) <- c("ID", "values")
my_df
I am trying to make a scatterplot where there are subsets of this data frame as outliers that are have separate colors to the non-outliers. On top of this, I am also trying to label these outliers with their associated number. Here is an example attempt:
ggplot(data=my_df, aes(x = seq(1, length(values)), y = my_df$values))+
geom_point(data = subset(my_df, values > 48), aes(color = "blue"))+
geom_point(data = subset(my_df, values < 24, aes(color = "red"))+
geom_text(data = subset(my_df, values > 48), aes(label = values))
The geom_text line of code provides this error.
Error: Aesthetics must be either length 1 or the same as the data (2): colour, x, y
Secondly, I have tried using ifelse to separate values by different colors as a different attempt - however, I do not know a way to label the different color sections with numbers, or even with a legend with names for each color section. Here is an example, but even with added geom_text, or attempts at adding a legend, what I intend on making will not work out. Here is the code that works as a baseline:
ggplot(data=my_df, aes(x = seq(1, length(values)), y = my_df$values))+
geom_point(color = ifelse(my_df$values > 25, "red", "blue"))
If anyone can help, I'll be so thankful, as I've been struggling with this for over a week now.
EDIT: The answers provided below have answered my question. This is the code for my resulting plot, including a legend title and names for each variable as a reference for those looking this up afterwards.
ggplot(my_df, aes(ID, values, color = factor(cut(values, c(0,24,48,Inf))))) +
geom_point(size=3) +
geom_text_repel(data = . %>% filter(values> 48), aes(label = values), show.legend = F)+
geom_text_repel(data = . %>% filter(values< 24), aes(label = values), show.legend = F)+
labs(title = "Beautiful Scatterplot", x = "ID", y = "Values", color = "Legend Title") +
scale_color_manual(labels = c("Below 24", "Between 24 and 48", "Above 48"), values = c("blue", "red", "purple"))
You can try
library(tidyverse)
library(ggrepel)
my_df %>%
mutate(col=case_when(values > 48 ~ 4,
values < 24 ~ 2,
T ~ 1)) %>%
ggplot(aes(ID, values, color = factor(col))) +
geom_point(size=3) +
geom_text_repel(data = . %>% filter(values> 48), aes(label = values)) +
scale_color_identity()
Or using only ggplot
ggplot(my_df, aes(ID, values, color = factor(cut(values, c(0,24,48,Inf))))) +
geom_point(size=3) +
geom_text_repel(data = . %>% filter(values> 48), aes(label = values), show.legend = F)