Search code examples
rplotdata-visualizationtimeline

How to implement a timeline in R with start date, end date, and a marker for a "middle date"?


We have a data frame in R of the following format:

Type    Request ID  Event Name  First Seen           Update       Last Seen
A          1          Event1    1/29/2017 19:54 4/19/2017 14:16 4/19/2017 15:05        
A          2          Event2    2/13/2017 14:20 5/2/2017 12:48  5/2/2017 12:54
A          3          Event3    4/29/2017 16:30 5/12/2017 11:05 5/12/2017 12:08
B          4          Event4    5/17/2017 20:23 5/18/2017 12:46 5/18/2017 16:15

The corresponding CSV file is:

Type,Request ID,Event Name,First Seen,Update,Last Seen
A,1,Event1,1/29/2017 19:54,4/19/2017 14:16,4/19/2017 15:05
A,2,Event2,2/13/2017 14:20,5/2/2017 12:48,5/2/2017 12:54
A,3,Event3,4/29/2017 16:30,5/12/2017 11:05,5/12/2017 12:08
B,4,Event4,5/17/2017 20:23,5/18/2017 12:46,5/18/2017 16:15

We want to visualize each instance on an R timeline such that we can see the Event on a timeline with start date, update, and end date.

We were able to get very close to this with our implementation in R which goes like this:

install.packages("timevis")
library("timevis")

df <- read.csv("data.csv", header = TRUE)
df_new = rename(df, start = First.Seen, end = Last.Seen, content = Request.ID)
timevis(dataframe_new)

Note that we are only using 'start' date and 'end' date in this implementation. This plots the following timeline:

enter image description here

Now we want to incorporate the 'Update' date and time in there somehow such that we are shown a pointer in each bar or slab indicating the update date and time. The bar will start at start date, end at end date, and have a marker at appropriate location to show the 'Update'.

How can we implement this in R?


Solution

  • Your data

    df <- structure(list(Type = c("A", "A", "A", "B"), Request.ID = 1:4, 
    Event.Name = c("Event1", "Event2", "Event3", "Event4"), First.Seen = structure(c(1485719640, 
    1486995600, 1493483400, 1495052580), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), Update = structure(c(1492611360, 1493729280, 
    1494587100, 1495111560), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), Last.Seen = structure(c(1492614300, 1493729640, 
    1494590880, 1495124100), tzone = "UTC", class = c("POSIXct", 
    "POSIXt"))), class = "data.frame", .Names = c("Type", "Request.ID", 
    "Event.Name", "First.Seen", "Update", "Last.Seen"), row.names = c(NA, 
    -4L))
    

    tidyverse solution

    I melt First.Seen & Update into a single column. The Last.Seen values for each Update value is = NA (made into a singularity). I add type column specifying point for singularities and background for ranges (to overlap values). I also add a group value.

    library(tidyverse)
    library(reshape2)
    library(lubridate)
    df1 <- df %>% 
             nest(First.Seen, Update) %>% 
             mutate(data = map(data, ~melt(.x))) %>% 
             unnest() %>%
             mutate(Last.Seen = ifelse(variable == "Update", as.character(NA), as.character(Last.Seen))) %>%
             mutate(Last.Seen = ymd_hms(Last.Seen)) %>%
             mutate(type = ifelse(is.na(Last.Seen), "point", "background")) %>%
             mutate(group = Request.ID) %>%
             rename(start = value, end = Last.Seen, content = Request.ID)
    

    First 4 rows of df1

       Type content Event.Name                 end   variable               start       type group
    1     A       1     Event1 2017-04-19 15:05:00 First.Seen 2017-01-29 19:54:00 background     1
    2     A       1     Event1                  NA     Update 2017-04-19 14:16:00      point     1
    3     A       2     Event2 2017-05-02 12:54:00 First.Seen 2017-02-13 14:20:00 background     2
    4     A       2     Event2                  NA     Update 2017-05-02 12:48:00      point     2
    

    Specify groups and what to label each row with groups=...

    timevis(data=df1, groups=data.frame(id=unique(df1$group), content=LETTERS[unique(df1$content)]))
    

    This produces four rows of timelines with the Update singularity (point) marking each range of First.Seen & Last.Seen.