Search code examples
rloopscharacterr-factor

Looping through levels of a factor and comparing one variable to another


I have a data.frame with one factor and two string character columns(nationality). The factor has 2662 levels each denoting a team. Teams have multiple members, thus each level has ~6 rows.

What I want to do is loop through the levels of the teams and compare a members nationality and see if this nationality is present in any of the levels the other character column. When there is a match I want a vector to be marked with 1, when there is no match I want a variable to be marked with 2.

Illustration

Team    N1  N2
1      JPN  US
1      US   GER
1      DNK  RUS
2      …    …
2      …    …

Ideally my code would register a 1 for US and a 2 for JPN

I've seen functions like split, tapply, etc... but I am having problems writing an anonymous function to achieve the goal I want:

tapply(Data, TEAM_ID, function () for (i in N1){if (N1 %in% N2) Identifyingvect <= 1} else {Identifyingvect <= 2})

Solution

  • This could be probably solved with by, but I prefer data.table for such tasks, something among these lines (btw, tapply is an aggregation function thus it won't work properly for assigning a value to each element in case there are dupes in N1)

    library(data.table) 
    setDT(Data)[, res := (!N1 %in% N2) + 1L, by = Team]
    

    Honestly, I prefer to keep res logical because it's both more intuitive and easier to operate on, though in order to assign 2 to FALSE matches and 1 to TRUE matches I had to look for non-matches instead of matches and then add a 1