I want to create a retention in R, the data looks as follows:
ID is an individual, who participated in year t.
Albert.Heijn
is 1 if the individual visited Albert Heijn.
Albert.Heijnv1-7
are customer satisfaction measurements.
If Albert.Heijn
is NA
, the individual did not visited the company in that year, thus the satisfaction measurements are NA
Now I need to create a retention variable, probably using a for loop
. Because ID 14401
= retention for 2012, because Albert.Heijn
in 2013 = 1. However, this same person does not get retention in 2013, because 2014 is missing.
In 2015 again, retention will be 1 because Albert.Heijn
= 1 in 2016. For 2016 retention will be 0 because no data of 2017 is available.
Finally, 2016 and 2013 should afterward be deleted, since retention cannot be measured if there is no observation of a consecutive year.
This needs to be done for 180+ different companies.
Can someone help me out? Thanks in advance.
This is a possible solution. You will need to create also a loop for the ID
.
Sample data
df <- data.frame("ID" = c(1,1,2,2,2,2), "Year" = c(2012, 2015,2012,2013,2015,2016), "AH" = c(1, NA, 1,1,1,1))
Code for ID == 2
current_year <- df[df$ID == 2, "Year"]
n <- length(current_year)
i = 0
df$retention <- 0
while(i<n){
i = i + 1
df_temp <- subset(df, df$Year == (current_year[i]+1) & df$ID == 2 )
n_temp <- nrow(df_temp)
if(n_temp>0)
if(df[df$Year == (current_year[i]+1), "ID" ] == 2 & df[df$Year == (current_year[i]+1), "AH"] == 1)
{
df[df$Year == current_year[i] & df$ID == 2, "retention"] <- 1
}
}
EDIT - More general code
If you want to generalize it for all ID
, you need to create a list of unique IDs, count the number of IDs and do a while loop. Code below
df <- data.frame("ID" = c(1,1,2,2,2,2), "Year" = c(2012, 2015,2012,2013,2015,2016), "AH" = c(1, NA, 1,1,1,1))
ID_list <- unique(df$ID)
n_ID <- length(ID_list)
j = 0
while(j < n_ID)
{
j = j + 1
current_year <- df[df$ID == ID_list[j], "Year"]
n <- length(current_year)
i = 0
df$retention <- 0
while(i<n){
i = i + 1
df_temp <- subset(df, df$Year == (current_year[i]+1) & df$ID == ID_list[j] )
n_temp <- nrow(df_temp)
if(n_temp>0)
if(df[df$Year == (current_year[i]+1), "ID" ] == ID_list[j] & df[df$Year == (current_year[i]+1), "AH"] == 1)
{
df[df$Year == current_year[i] & df$ID == ID_list[j], "retention"] <- 1
}
}
}