Search code examples
rlistdataframerecode

Recode a dataframe variable based on matches in a list


I am trying to recode a variable in a data frame based on matches to elements in a separate list. For example:

df <- data.frame(stringsAsFactors = FALSE,
  var1 = c("116", "117", "118", "SL1", "SL2", "234"))

matchList <- list(c("116, 117, and 118", "116", "117", "118"), 
c("SL1/SL2", "SL1", "SL2"))

df
var1
1     116
2     117
3     118
4     SL1
5     SL2
6     234

matchList
[[1]]
[1] "116, 117, and 118" "116"               "117"               "118"              

[[2]]
[1] "SL1/SL2" "SL1"     "SL2"    

If the original var1 element matches items 2 - 4 of a matchList element, it should be recoded with item 1 of that same list element. I want the recoded var1 to look like the following:

var1
1     116, 117, and 118
2     116, 117, and 118
3     116, 117, and 118
4     SL1/SL2
5     SL1/SL2
6     234

The following lines of code work one list element at a time, but I'm not clear on how to automate this:

# get indices of matches for matchList element 1
r <- which(df$var1 %in% matchList[[1]]) 
# replace matches with first list item of list element 1 using indices of matches
df$var1[r] <- matchList[[1]][1] 

I've tried the following for loop, but I'm not sure what I'm missing

for (i in length(matchList)){
  r <- which(df$var1 %in% matchList[[i]])
  df$var1[r] <- matchList[[i]][1]
}

Solution

  • The issue is in the length(matchList) which is a single value i.e. 2. Instead, we need to loop through the sequence

    for(i in seq_along(matchList)) {
         r <- which(df$var1 %in% matchList[[i]])
         df$var1[r] <- matchList[[i]][1]
      } 
    df
    #               var1
    #1 116, 117, and 118
    #2 116, 117, and 118
    #3 116, 117, and 118
    #4           SL1/SL2
    #5           SL1/SL2
    #6               234