Search code examples
rggplot2line-plot

How to draw a line plot when dates are in two different columns?


Lets say I have the following dataset:

dt <- data.frame(id= c(1),
                 parameter= c("a","b","c"),
                 start_day = c(1,8,4),
                 end_day = c(16,NA,30))

enter image description here

I want to create a type of line chart such that "parameter" column is my y-axis and both ("start_day" and "end_day") are on my x-axis.

Alo, if both "start_day" and "end_day" have values, then they be connected through a line. In case there is no "end_day" (like for parameter "b") then the "start_day" be connected to an arrow indicating there is no "end_day" for that parameter. (I know it sound confusing but I will make an example to clarify)

I know that for line chart I need to have all the dates in one column. But in my data frame I have two separate columns (start and end dates). So I think line chart is not the proper tool for this case and instead I tried swimmer_points_from_lines and swimmer_arrows.

I added a new column named "cont" to be used in swimmer_arrows.

dt$cont <- with(dt, ifelse(end_day > 0 ,1, 0))

ggplot(data = dt)+
  swimmer_points_from_lines(df= dt, 
                            id="parameter", 
                            start= "start_day",
                            end="end_day")+
  swimmer_arrows(df_arrows = dt,
                 id="parameter",
                 arrow_start = "start_day",
                 cont = "cont")+
  coord_flip()

The outcome is as follow: enter image description here

What I am looking for at this point is to find a way to draw a line between "start_day" and "end_day" (given both end and start day exist). And if there is no "end_day" I want an arrow indicates that there is no end date (the exact opposite of what I am getting right now).

Any help is much appreciated.


Solution

  • To draw a line between points where the values are in separate columns you could use geom_segment. To add an arrow to obs with no end date one option would be to split your data into two parts, one with non-missing and one with missing end-dates, and use two geom_segments:

    library(ggplot2)
    
    ggplot(dt, aes(y = parameter)) + 
      geom_segment(data = ~subset(.x, !is.na(end_day)), aes(x = start_day, xend = end_day, yend = parameter)) +
      geom_segment(data = ~subset(.x, is.na(end_day)), aes(x = start_day, xend = start_day + 1, yend = parameter), 
                   arrow = arrow(type = "closed", length = unit(0.1, "inches"))) +
      geom_point(aes(x = start_day, shape = "start_day"), size = 3, color = "red") +
      geom_point(aes(x = end_day, shape = "end_day"), size = 3, color = "red") +
      scale_shape_manual(values = c(16, 17))
    #> Warning: Removed 1 rows containing missing values (geom_point).