Search code examples
rggplot2scatter-plotcategorical-data

Package for category overlines on scatterplot in ggplot


I have data organised as in this example:

data1 <- tibble(seq = factor(1:20),
                value = rnorm(20, 10, 2),
                par_a = c(rep("S1", 6), rep("S2", 14)),
                par_b = c(rep("B1", 18), rep("S2", 2))
          )

X axis value, and parameters for category overlines will be categorical. seq will be about 50 unique values. Both parameters will have 2, 3 or 4 possible values. Y axis value is continuous.

I'm looking for package that will allow me to make plot like tis desired plot

I saw it made with ggplot once so I assume there might be package that allows to do it. Unfortunately I'm unable to find which package it is.

Radek


Solution

  • Perhaps there is a package to achieve that. But with some data wrangling you could achieve your desired result using ggplot2 like so:

    library(tidyverse)
    
    set.seed(123)
    
    data1 <- tibble(
      seq = factor(1:20),
      value = rnorm(20, 10, 2),
      par_a = c(rep("S1", 6), rep("S2", 14)),
      par_b = c(rep("B1", 18), rep("B2", 2))
    )
    
    dat_segment <- data1 |>
      select(-value) |>
      pivot_longer(-seq, values_to = "category") |>
      group_by(name, category) |>
      filter(seq %in% c(first(seq), last(seq))) |>
      mutate(
        seq = as.numeric(seq),
        seq = case_when(
          seq > 1 & seq == last(seq) ~ seq + .4,
          seq > 1 & seq == first(seq) ~ seq - .4,
          .default = seq
        )
      ) |>
      ungroup() |>
      mutate(value = if_else(name == "par_a", 20, 18))
    
    dat_label <- dat_segment |>
      summarise(
        seq = mean(seq), value = unique(value),
        .by = c(name, category)
      )
    
    library(ggplot2)
    
    ggplot(data1, aes(seq, value)) +
      geom_point() +
      geom_label(
        data = dat_label,
        aes(label = category, color = name),
        vjust = 0,
        fill = NA,
        label.size = 0,
        show.legend = FALSE
      ) +
      geom_line(
        data = dat_segment,
        aes(
          color = name, group = interaction(category, name)
        ),
        linewidth = 1
      )