Search code examples
rmarkov

Transform Data format for "channelAttribution" package in R


I have the following data format -

Id     Record      Date      Medium
  1    display     9/7/2016     A
  1    display     9/8/2016     B
  1    display     9/9/2016     A
  1    Interaction 9/10/2016    B

  1    display     9/11/2016    A
  1    display     9/12/2016    B
  1    display     9/13/2016    A
  1    Interaction 9/14/2016    B

  1    display     9/15/2016    C
  1    display     9/16/2016    B

  2    display     9/17/2016    A
  2    display     9/18/2016    B
  2    display     9/19/2016    A
  2    Interaction 9/20/2016    B

  2    display     9/21/2016    A
  2    display     9/22/2016    B
  2    display     9/23/2016    A
  2    display     9/24/2016    B

Brief about the Data - For a particular id, the data shows the journey of the user from one medium to another, which then maybe results in website interaction.

Ex: For id "2" the journey was from Medium 1>2>1>2 (-> Note that the journey stopped hear as there was a website interaction). The blank lines inserted in the data above indicate where the journey stops. Thus, in this case, ID "2" has 2 different journeys.

NOTE: I have already sorted the data using ID and Date as variables, thus the above case will always happen and we do not consider Date here.

What i am trying to accomplish To create a new table, with all the paths in first column, Count of conversions in second and count of non-conversions in the third.

  • A conversion is when the path ends with "Record = Interaction"

The final output for the above table will be as follows -

    PATH    Count_Conversion   Count_Non-Convert
  A>B>A>B        3                      1
  C>B            0                      1

Kindly help with a code in R.


Solution

  • For example

    library(tidyverse)
    df <- read_table("Id     Record      Date      Medium
      1    display     9/7/2016     A
      1    display     9/8/2016     B
      1    display     9/9/2016     A
      1    Interaction 9/10/2016    B
    
      1    display     9/11/2016    A
      1    display     9/12/2016    B
      1    display     9/13/2016    A
      1    Interaction 9/14/2016    B
    
      1    display     9/15/2016    C
      1    display     9/16/2016    B
    
      2    display     9/17/2016    A
      2    display     9/18/2016    B
      2    display     9/19/2016    A
      2    Interaction 9/20/2016    B
    
      2    display     9/21/2016    A
      2    display     9/22/2016    B
      2    display     9/23/2016    A
      2    display     9/24/2016    B")
    df %>% 
      mutate(Id = cumsum(is.na(Id))+1) %>% 
      filter(!is.na(Medium)) %>% 
      group_by(Id) %>% 
      summarise(
        path = paste(Medium, collapse=">"), 
        conversion = "Interaction" %in% Record
      ) %>% 
      group_by(path) %>% 
      summarise(
        cons=sum(conversion), 
        no_cons=sum(!conversion)
      )
    # # A tibble: 2 x 3
    #      path  cons no_cons
    #     <chr> <int>   <int>
    # 1 A>B>A>B     3       1
    # 2     C>B     0       1