Search code examples
rtidyverse

fct_reorder by date on recurring items


Trying to make Gantt style chart for complex antibiotic regimens we sometimes have to give patients.

Sometimes the antibiotic dose is changed then put back to original dose, or antibiotic changed completely before then being changed back again at later date.

Tried to capture problem in reprex below.

library(tidyverse)
library(lubridate)

df <- tribble(
  ~id,  ~start, ~end, ~antibiotic,
  1,    "02/02/23", "22/02/23", "A 1g",
  2,    "22/02/23", "10/03/23", "A 2g",
  3,    "10/03/23", "15/03/23", "A 1g",
  4,    "28/02/23", "11/03/23", "B 1g",
  5,    "11/03/23", "03/04/23", "B 2g",
  6,    "03/04/23", "10/04/23", "B 1g")


# trying to reorder the antibiotic factor level based on start date then pivot longer into tidy data form
df <- df%>%
  mutate(across(c("start", "end"), dmy),
         antibiotic = fct_reorder(antibiotic, start, .desc = TRUE)) %>%
  pivot_longer(
    cols = c("start", "end"),
    names_to = "start_end",
    values_to = "date"
  )

# create gantt style plot
df %>%
  ggplot(aes(x = date)) +
  geom_line(aes(y = antibiotic, colour = antibiotic, group = id), linewidth = 5)

I was expecting antibiotic B 1g to be diplayed above B 2g, as it has a start date before B 2g, just like antibiotic A.

When I check the factor levels, antibiotic A appears to be in correct order but antibiotic B has swapped around:

# levels should be "A 1g" "A 2g"  "B 1g"  "B 2g"

rev(levels(df$antibiotic))
#> [1] "A 1g" "A 2g" "B 2g" "B 1g"

I can't get my head round why fct_reorder is doing this correctly for A but not B.

Would be grateful if anyone could explain this.

Created on 2023-04-03 with reprex v2.0.2


Solution

  • It's because you have multiple start dates for each level of the factor. fct_reorder() doesn't know which one to choose. One option is to use slice_min() and construct the factor levels based on the first date shown:

    antibiotic_levels <- df |> 
      mutate(across(c("start", "end"), dmy)) |> 
      group_by(antibiotic) |> 
      slice_min(start) |> 
      arrange(start) |> 
      mutate(antibiotic = factor(antibiotic, levels = antibiotic)) |> 
      pull(antibiotic)
    
    df %>%
      mutate(across(c("start", "end"), dmy),
             antibiotic = factor(antibiotic, levels = rev(antibiotic_levels))) |> 
      pivot_longer(
        cols = c("start", "end"),
        names_to = "start_end",
        values_to = "date"
      ) %>%
      ggplot(aes(x = date)) +
      geom_line(aes(y = antibiotic, colour = antibiotic, group = id), linewidth = 5)
    

    which gives:

    Gantt chart example