Search code examples
rdatetimenetworkd3

How to draw network diagram from data frame columns in R?


I have a data frame of customers. I want to draw a customer stages as network diagram. Sample data is like below.

cust_id     checkin time           stage2                     stage3              checkout time
12345   2019-01-01 07:02:50     2019-01-01 07:23:25        2019-01-01 07:23:22  2019-01-01 08:37:43
56789   2019-01-01 07:25:21     2019-01-01 07:35:29        2019-01-01 07:35:27  2019-01-01 09:36:06
43256   2019-01-01 07:27:22     2019-01-01 07:42:49        NA                   2019-01-01 09:34:55
34567   2019-01-01 07:22:15     2019-01-01 08:25:35        2019-01-01 07:26:02  2019-01-01 09:00:40
89765   2019-01-01 08:29:35     2019-01-01 08:30:58        NA                   2019-01-01 09:02:48
23456   2019-01-01 08:54:12     2019-01-01 09:18:46        2019-01-01 09:08:34  2019-01-01 09:46:38

The raw data is look like above. There is no rule for customer i.e, Some of the customers checkout after stage2 and some of the customers has to go stage 3 and checkout after stage 3 .

Basically , I want to draw network map of the cusomers stages like below:

checkin > stage2 > stage3 > checkout
             |
            checkout

How to do that in R?
Tried like below with networkD3 package:

library(igraph)
library(networkD3)
p <- simpleNetwork(df, height="100px", width="100px",        
                   Source = 1,                 # column number of source
                   Target = 5,                 # column number of target
                   linkDistance = 10,          # distance between node. Increase this value to have more space between nodes
                   charge = -900,                # numeric value indicating either the strength of the node repulsion (negative value) or attraction (positive value)
                   fontSize = 14,               # size of the node names
                   fontFamily = "serif",       # font og node names
                   linkColour = "#666",        # colour of edges, MUST be a common colour for the whole graph
                   nodeColour = "#69b3a2",     # colour of nodes, MUST be a common colour for the whole graph
                   opacity = 0.9,              # opacity of nodes. 0=transparent. 1=no transparency
                   zoom = T                    # Can you zoom on the figure?
)

p

Please, help me to find the way to it.


Solution

  • here's one solution using networkD3...

    library(tidyverse)
    library(lubridate)
    library(networkD3)
    
    data <- 
      tribble(
      ~cust_id, ~checkin.time,         ~stage2,               ~stage3,               ~checkout.time,
      12345,    "2019-01-01 07:02:50", "2019-01-01 07:23:25", "2019-01-01 07:23:22", "2019-01-01 08:37:43",
      56789,    "2019-01-01 07:25:21", "2019-01-01 07:35:29", "2019-01-01 07:35:27", "2019-01-01 09:36:06",
      43256,    "2019-01-01 07:27:22", "2019-01-01 07:42:49", NA,                    "2019-01-01 09:34:55",
      34567,    "2019-01-01 07:22:15", "2019-01-01 08:25:35", "2019-01-01 07:26:02", "2019-01-01 09:00:40",
      89765,    "2019-01-01 08:29:35", "2019-01-01 08:30:58", NA,                    "2019-01-01 09:02:48",
      23456,    "2019-01-01 08:54:12", "2019-01-01 09:18:46", "2019-01-01 09:08:34", "2019-01-01 09:46:38"
      ) %>% 
      mutate(across(!cust_id, ~ymd_hms(.x, tz = "UTC")))
    
    data %>% 
      select(-cust_id) %>% 
      mutate(across(.fns = ~if_else(is.na(.x), NA_character_, cur_column()))) %>% 
      mutate(row = row_number()) %>%
      mutate(origin = .[[1]]) %>%
      gather("column", "source", -row, -origin) %>%
      mutate(column = match(column, names(data))) %>%
      filter(!is.na(source)) %>% 
      arrange(row, column) %>%
      group_by(row) %>%
      mutate(target = lead(source)) %>%
      ungroup() %>%
      filter(!is.na(source) & !is.na(target)) %>%
      mutate(target = if_else(target == "checkout.time", paste0(target, " from ", source), target)) %>% 
      select(source, target, origin) %>%
      group_by(source, target, origin) %>%
      summarise(count = n()) %>%
      ungroup() %>%
      simpleNetwork()
    

    enter image description here