I have 2 data frames, one I want to use each row of it as a condition to apply it in the second data frame. This is an example of them:
condition_df <- data.frame(
type = c("3701","3801","4901","4901", "3701"),
position = c(51,51, 22,22, 13),
subtype = c("M", "M","D","K", "E")
)
df <- data.frame(
id = 1:5,
type1 = c("3701", "4901", "4902", "4903", "1501"),
type2 = c("3801", "4901", "4901","5050","6069"),
type3 = c("3901", "5001", "8901","5050","6069"),
type4 = c("4001", "3901", "7901","5050","6069"),
`51` = c("M", "A", "M", "M", "A"),
`22` = c("D", "D", "K", "A", "D"),
`13` = c("E", "G", "E", "E", "G"),
`55` = c("M", "A", "M", "M", "A"),
`25` = c("D", "D", "K", "A", "D"),
`16` = c("E", "G", "E", "E", "G"), check.names = FALSE
)
so I want to get a score in the second data frame(df), the way I want to apply it is by using condition data frame rows to loop over each row of data frame. The condition is if the type is not present in the columns of df (type1:type4), then search for the column that has the name == position (in condition_df), and count the numbers of the letters in that column that only match subtype in the condition_df. The final score should be the sum of all the conditions of condition_df for each ID in df. Example ID 1 score is only 1 as we got 0 for the first and the last conditions (because it has the type so we ignore the subtype) and got 1 as it has the second subtype "D" and don't have its type. Any help is appreciated.
I tried something like this but didnot work:
for(i in seq_len(nrow(condition_df))) {
for(j in seq_len(nrow(df))) {
type <- condition_df$type[i]
position <- as.character(condition_df$position[i])
subtype <- condition_df$subtype[i]
df<-df %>%
mutate(cir_score=case_when(!any(across(starts_with("type"))==type)~sum(get(as.character(position))==subtype),
TRUE~0
))
}
}
Pivot to long form on the position columns (51, 22, ...) and then join with the condition data frame. Then filter out the rows where the type matches types 1-4, then the score is the number of records left for each id.
full_join(df,
pivot_longer(df, -c(id, starts_with("type")),
names_to="position", values_to="subtype",
names_transform=list(position=as.integer)) |>
inner_join(condition_df, by=c("position", "subtype"), relationship = "many-to-many") |>
filter(!if_any(matches("type\\d+"), ~ .x==type)) |>
distinct(id, position, subtype) |>
count(id, name="score"),
by="id") |>
mutate(score=replace_na(score, 0))
id type1 type2 type3 type4 51 22 13 55 25 16 score
1 1 3701 3801 3901 4001 M D E M D E 1
2 2 4901 4901 5001 3901 A D G A D G 0
3 3 4902 4901 8901 7901 M K E M K E 2
4 4 4903 5050 5050 5050 M A E M A E 2
5 5 1501 6069 6069 6069 A D G A D G 1