I have the following data format -
Id Record Date Medium
1 display 9/7/2016 A
1 display 9/8/2016 B
1 display 9/9/2016 A
1 Interaction 9/10/2016 B
1 display 9/11/2016 A
1 display 9/12/2016 B
1 display 9/13/2016 A
1 Interaction 9/14/2016 B
1 display 9/15/2016 C
1 display 9/16/2016 B
2 display 9/17/2016 A
2 display 9/18/2016 B
2 display 9/19/2016 A
2 Interaction 9/20/2016 B
2 display 9/21/2016 A
2 display 9/22/2016 B
2 display 9/23/2016 A
2 display 9/24/2016 B
Brief about the Data - For a particular id, the data shows the journey of the user from one medium to another, which then maybe results in website interaction.
Ex: For id "2" the journey was from Medium 1>2>1>2 (-> Note that the journey stopped hear as there was a website interaction). The blank lines inserted in the data above indicate where the journey stops. Thus, in this case, ID "2" has 2 different journeys.
NOTE: I have already sorted the data using ID and Date as variables, thus the above case will always happen and we do not consider Date here.
What i am trying to accomplish To create a new table, with all the paths in first column, Count of conversions in second and count of non-conversions in the third.
The final output for the above table will be as follows -
PATH Count_Conversion Count_Non-Convert
A>B>A>B 3 1
C>B 0 1
Kindly help with a code in R.
For example
library(tidyverse)
df <- read_table("Id Record Date Medium
1 display 9/7/2016 A
1 display 9/8/2016 B
1 display 9/9/2016 A
1 Interaction 9/10/2016 B
1 display 9/11/2016 A
1 display 9/12/2016 B
1 display 9/13/2016 A
1 Interaction 9/14/2016 B
1 display 9/15/2016 C
1 display 9/16/2016 B
2 display 9/17/2016 A
2 display 9/18/2016 B
2 display 9/19/2016 A
2 Interaction 9/20/2016 B
2 display 9/21/2016 A
2 display 9/22/2016 B
2 display 9/23/2016 A
2 display 9/24/2016 B")
df %>%
mutate(Id = cumsum(is.na(Id))+1) %>%
filter(!is.na(Medium)) %>%
group_by(Id) %>%
summarise(
path = paste(Medium, collapse=">"),
conversion = "Interaction" %in% Record
) %>%
group_by(path) %>%
summarise(
cons=sum(conversion),
no_cons=sum(!conversion)
)
# # A tibble: 2 x 3
# path cons no_cons
# <chr> <int> <int>
# 1 A>B>A>B 3 1
# 2 C>B 0 1