I have a data.frame with one factor and two string character columns(nationality). The factor has 2662 levels each denoting a team. Teams have multiple members, thus each level has ~6 rows.
What I want to do is loop through the levels of the teams and compare a members nationality and see if this nationality is present in any of the levels the other character column. When there is a match I want a vector to be marked with 1, when there is no match I want a variable to be marked with 2.
Illustration
Team N1 N2
1 JPN US
1 US GER
1 DNK RUS
2 … …
2 … …
Ideally my code would register a 1 for US and a 2 for JPN
I've seen functions like split, tapply, etc... but I am having problems writing an anonymous function to achieve the goal I want:
tapply(Data, TEAM_ID, function () for (i in N1){if (N1 %in% N2) Identifyingvect <= 1} else {Identifyingvect <= 2})
This could be probably solved with by
, but I prefer data.table
for such tasks, something among these lines (btw, tapply
is an aggregation function thus it won't work properly for assigning a value to each element in case there are dupes in N1
)
library(data.table)
setDT(Data)[, res := (!N1 %in% N2) + 1L, by = Team]
Honestly, I prefer to keep res
logical because it's both more intuitive and easier to operate on, though in order to assign 2
to FALSE
matches and 1
to TRUE
matches I had to look for non-matches instead of matches and then add a 1