Search code examples
rretention

Creating a retention variable in R Based on the value of X in year t+1


I want to create a retention in R, the data looks as follows:

ID is an individual, who participated in year t.

Albert.Heijn is 1 if the individual visited Albert Heijn.

Albert.Heijnv1-7 are customer satisfaction measurements.

If Albert.Heijn is NA, the individual did not visited the company in that year, thus the satisfaction measurements are NA

enter image description here

Now I need to create a retention variable, probably using a for loop. Because ID 14401 = retention for 2012, because Albert.Heijn in 2013 = 1. However, this same person does not get retention in 2013, because 2014 is missing. In 2015 again, retention will be 1 because Albert.Heijn = 1 in 2016. For 2016 retention will be 0 because no data of 2017 is available.

Finally, 2016 and 2013 should afterward be deleted, since retention cannot be measured if there is no observation of a consecutive year.

This needs to be done for 180+ different companies.

Can someone help me out? Thanks in advance.


Solution

  • This is a possible solution. You will need to create also a loop for the ID.

    Sample data

    df <- data.frame("ID" = c(1,1,2,2,2,2), "Year" = c(2012, 2015,2012,2013,2015,2016), "AH" = c(1, NA, 1,1,1,1))
    

    Code for ID == 2

    current_year <- df[df$ID == 2, "Year"]
    n <- length(current_year)
    i = 0
    df$retention <- 0
    while(i<n){
      i = i + 1
    
      df_temp <- subset(df, df$Year == (current_year[i]+1) & df$ID == 2 )
      n_temp <- nrow(df_temp)
      if(n_temp>0)
      if(df[df$Year == (current_year[i]+1), "ID" ] == 2 & df[df$Year == (current_year[i]+1), "AH"] == 1)
      {
        df[df$Year == current_year[i] & df$ID == 2, "retention"] <- 1
      }
    
    
    }
    

    EDIT - More general code

    If you want to generalize it for all ID, you need to create a list of unique IDs, count the number of IDs and do a while loop. Code below

    df <- data.frame("ID" = c(1,1,2,2,2,2), "Year" = c(2012, 2015,2012,2013,2015,2016), "AH" = c(1, NA, 1,1,1,1))
    
    ID_list <- unique(df$ID)
    
    n_ID <- length(ID_list)
    
    j = 0
    
    while(j < n_ID)
    {
      j = j + 1
    current_year <- df[df$ID == ID_list[j], "Year"]
    n <- length(current_year)
    i = 0
    df$retention <- 0
    while(i<n){
      i = i + 1
    
      df_temp <- subset(df, df$Year == (current_year[i]+1) & df$ID == ID_list[j] )
      n_temp <- nrow(df_temp)
      if(n_temp>0)
      if(df[df$Year == (current_year[i]+1), "ID" ] == ID_list[j] & df[df$Year == (current_year[i]+1), "AH"] == 1)
      {
        df[df$Year == current_year[i] & df$ID == ID_list[j], "retention"] <- 1
      }
    
    
    }
    }