I have a data frame of customers. I want to draw a customer stages as network diagram. Sample data is like below.
cust_id checkin time stage2 stage3 checkout time
12345 2019-01-01 07:02:50 2019-01-01 07:23:25 2019-01-01 07:23:22 2019-01-01 08:37:43
56789 2019-01-01 07:25:21 2019-01-01 07:35:29 2019-01-01 07:35:27 2019-01-01 09:36:06
43256 2019-01-01 07:27:22 2019-01-01 07:42:49 NA 2019-01-01 09:34:55
34567 2019-01-01 07:22:15 2019-01-01 08:25:35 2019-01-01 07:26:02 2019-01-01 09:00:40
89765 2019-01-01 08:29:35 2019-01-01 08:30:58 NA 2019-01-01 09:02:48
23456 2019-01-01 08:54:12 2019-01-01 09:18:46 2019-01-01 09:08:34 2019-01-01 09:46:38
The raw data is look like above. There is no rule for customer i.e, Some of the customers checkout after stage2 and some of the customers has to go stage 3 and checkout after stage 3 .
Basically , I want to draw network map of the cusomers stages like below:
checkin > stage2 > stage3 > checkout
|
checkout
How to do that in R?
Tried like below with networkD3 package:
library(igraph)
library(networkD3)
p <- simpleNetwork(df, height="100px", width="100px",
Source = 1, # column number of source
Target = 5, # column number of target
linkDistance = 10, # distance between node. Increase this value to have more space between nodes
charge = -900, # numeric value indicating either the strength of the node repulsion (negative value) or attraction (positive value)
fontSize = 14, # size of the node names
fontFamily = "serif", # font og node names
linkColour = "#666", # colour of edges, MUST be a common colour for the whole graph
nodeColour = "#69b3a2", # colour of nodes, MUST be a common colour for the whole graph
opacity = 0.9, # opacity of nodes. 0=transparent. 1=no transparency
zoom = T # Can you zoom on the figure?
)
p
Please, help me to find the way to it.
here's one solution using networkD3
...
library(tidyverse)
library(lubridate)
library(networkD3)
data <-
tribble(
~cust_id, ~checkin.time, ~stage2, ~stage3, ~checkout.time,
12345, "2019-01-01 07:02:50", "2019-01-01 07:23:25", "2019-01-01 07:23:22", "2019-01-01 08:37:43",
56789, "2019-01-01 07:25:21", "2019-01-01 07:35:29", "2019-01-01 07:35:27", "2019-01-01 09:36:06",
43256, "2019-01-01 07:27:22", "2019-01-01 07:42:49", NA, "2019-01-01 09:34:55",
34567, "2019-01-01 07:22:15", "2019-01-01 08:25:35", "2019-01-01 07:26:02", "2019-01-01 09:00:40",
89765, "2019-01-01 08:29:35", "2019-01-01 08:30:58", NA, "2019-01-01 09:02:48",
23456, "2019-01-01 08:54:12", "2019-01-01 09:18:46", "2019-01-01 09:08:34", "2019-01-01 09:46:38"
) %>%
mutate(across(!cust_id, ~ymd_hms(.x, tz = "UTC")))
data %>%
select(-cust_id) %>%
mutate(across(.fns = ~if_else(is.na(.x), NA_character_, cur_column()))) %>%
mutate(row = row_number()) %>%
mutate(origin = .[[1]]) %>%
gather("column", "source", -row, -origin) %>%
mutate(column = match(column, names(data))) %>%
filter(!is.na(source)) %>%
arrange(row, column) %>%
group_by(row) %>%
mutate(target = lead(source)) %>%
ungroup() %>%
filter(!is.na(source) & !is.na(target)) %>%
mutate(target = if_else(target == "checkout.time", paste0(target, " from ", source), target)) %>%
select(source, target, origin) %>%
group_by(source, target, origin) %>%
summarise(count = n()) %>%
ungroup() %>%
simpleNetwork()