Search code examples
rdataframeggplot2colorsgeom-text

Scatterplot with different colored outliers and geom text number labels


I have a data frame with multiple columns. Here is an example.

my_df <- data.frame(x = 1:5, y = c(50, 22, 15, 33, 49))
colnames(my_df) <- c("ID", "values")
my_df

I am trying to make a scatterplot where there are subsets of this data frame as outliers that are have separate colors to the non-outliers. On top of this, I am also trying to label these outliers with their associated number. Here is an example attempt:

ggplot(data=my_df, aes(x = seq(1, length(values)), y = my_df$values))+ geom_point(data = subset(my_df, values > 48), aes(color = "blue"))+ geom_point(data = subset(my_df, values < 24, aes(color = "red"))+ geom_text(data = subset(my_df, values > 48), aes(label = values))

The geom_text line of code provides this error.

Error: Aesthetics must be either length 1 or the same as the data (2): colour, x, y

Secondly, I have tried using ifelse to separate values by different colors as a different attempt - however, I do not know a way to label the different color sections with numbers, or even with a legend with names for each color section. Here is an example, but even with added geom_text, or attempts at adding a legend, what I intend on making will not work out. Here is the code that works as a baseline:

ggplot(data=my_df, aes(x = seq(1, length(values)), y = my_df$values))+
  geom_point(color = ifelse(my_df$values > 25, "red", "blue"))

If anyone can help, I'll be so thankful, as I've been struggling with this for over a week now.

EDIT: The answers provided below have answered my question. This is the code for my resulting plot, including a legend title and names for each variable as a reference for those looking this up afterwards.

ggplot(my_df, aes(ID, values, color = factor(cut(values, c(0,24,48,Inf))))) +
  geom_point(size=3) + 
  geom_text_repel(data = . %>% filter(values> 48), aes(label = values), show.legend = F)+
  geom_text_repel(data = . %>% filter(values< 24), aes(label = values), show.legend = F)+
  labs(title = "Beautiful Scatterplot", x = "ID", y = "Values", color = "Legend Title") +
  scale_color_manual(labels = c("Below 24", "Between 24 and 48", "Above 48"), values = c("blue", "red", "purple")) 

Example Answer Scatterplot


Solution

  • You can try

    library(tidyverse)
    library(ggrepel)
    my_df %>% 
      mutate(col=case_when(values > 48 ~ 4,
                           values < 24 ~ 2,
                           T ~ 1)) %>% 
      ggplot(aes(ID, values, color = factor(col))) +
       geom_point(size=3) + 
       geom_text_repel(data = . %>% filter(values> 48), aes(label = values)) + 
       scale_color_identity()
    

    enter image description here

    Or using only ggplot

      ggplot(my_df, aes(ID, values, color = factor(cut(values, c(0,24,48,Inf))))) +
       geom_point(size=3) + 
       geom_text_repel(data = . %>% filter(values> 48), aes(label = values), show.legend = F)