Search code examples
rggplot2segment

draw segments connecting all possible datapoint pairs, colouring by sign of segment slope (plotting kendall tau)


My question relates to this article by Davis and Chen (2006), in which it is shown a way to visualise Kendall's tau measure of non-parametric correlation between two variables.

Given a number of datapoints in a scatterplot, each point is connected to all the other points by a line segment. A line segment can be of different colours following these criteria:

  1. line segment is black if its slope is positive;
  2. line segment is red if its slope is negative;
  3. line segment is blue is its slope is 0 (horizontally flat line);
  4. line segment is black as in 1. if its slope is undefined (vertical line).

Here is an example from the original article:

enter image description here

My problem is that I can generate a scatterplot, but not the line segments that connect all possible pairs of points, changing colour depending on the criteria above.

Here is an example of dataset:

dataset <- dplyr::tibble(alpha = c(1, 5, 7, 8, 9, 10, 11, 12), 
              beta =  c(7, 7, 5, 4, 3, 14, 15, 18))

I can generate this:

ggplot2::ggplot(dataset, aes(x = alpha, y = beta)) + geom_point()

enter image description here

but not this:

enter image description here

NOTE. The solution has to be generalisable to a dataset with a large number of datapoints (~1000)


Solution

  • There's many ways, but you need to build your own data.frame of segments. E.g.

    library(tidyverse)
    
    pd <- dataset %>% 
      mutate(d = map(row_number(), function(x) slice(., -x) %>% rename(x = alpha, y = beta))) %>% 
      unnest(d) %>% 
      mutate(
        slope = (y - beta) / (x - alpha),
        cat = case_when(
          is.infinite(slope) | slope > 0 ~ 'a', 
          slope < 0 ~ 'b',
          slope == 0 ~ 'c'
        )
      )
    
    ggplot() +
      geom_segment(aes(alpha, xend = x, beta, yend = y, color = cat), pd) +
      geom_point(aes(alpha, beta), dataset) +
      scale_color_manual(values = c(a = 'black', b = 'red', c = 'blue'))
    

    enter image description here