Search code examples
rggplot2overlapgeom-point

ggplot: define color for point overlaps


With ggplot2 I want to plot two vectors (vec1_num, vec2_num) in two dimensions and colour the points by a group variable (vec3_char). Some data points are overlapping.

library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")

# plot 1
ggplot(data = NULL) +
  geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4) +
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
  theme(panel.grid = element_blank())

I know I can attenuate the overlap by reducing alpha or working with geom_jitter adding a bit of noise. Like this:

# plot 2
ggplot(data = NULL) +
  geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1) +
  scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
  theme(panel.grid = element_blank())

However, is it possible to make use of plot 1 but colour the overlapping points differently? So that, for example, "A" = "darkblue, "AB" = "black", "ABC" = "grey", "B" = "darkred", "BC" = "pink", "C"="orange"? And can I additionally add a small Venn Diagram (legend) that visualises the color choice for the point overlap?

Thanks!


Solution

  • My way of doing this would be to convert the letters into numbers, sum them and covert back into letters.

    NB The one complication is that the letters need to be A, B, D, H, ... so there is only one way of making each number combination. Though there is probably a way to start with A, B, C, ... and encode for unique values

    library(tidyverse)
    vec1_num = c(1,2,3,4,1,3,4,5,5,5)
    vec2_num = c(1,2,3,4,1,3,4,5,5,5)
    vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")
    
    removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string
    
    data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))
    
    data <- data %>% 
      group_by(x) %>%
      mutate(colour = glue::glue_collapse(col, sep = "")) %>%
      select(-col) %>% 
      distinct(x, y, .keep_all = TRUE) %>% 
      mutate(colour = removeDup(colour)) %>%
      mutate(colour = sapply(str_extract_all(colour, '\\d'), function(x) sum(as.integer(x)))) %>% 
      mutate(colour = case_when(
        colour == 1 ~ "A",
        colour == 2 ~ "B",
        colour == 3 ~ "AB",
        colour == 4 ~ "D",
        colour == 5 ~ "AD",
        colour == 6 ~ "BD",
        colour == 7 ~ "ABD"
      ))
    
    # plot 1
    ggplot(data) +
      geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4) +
      geom_text(aes(x = x, y = y, label = colour), vjust = 2) +
      scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour") +
      theme(panel.grid = element_blank())
    

    . enter image description here