Search code examples
r

Use rows of dataframe to apply conditions on another dataframe in R


I have 2 data frames, one I want to use each row of it as a condition to apply it in the second data frame. This is an example of them:

condition_df <- data.frame(
  type = c("3701","3801","4901","4901", "3701"),
  position = c(51,51, 22,22, 13),
  subtype = c("M", "M","D","K", "E")
)
df <- data.frame(
  id = 1:5,
  type1 = c("3701", "4901", "4902", "4903", "1501"),
  type2 = c("3801", "4901", "4901","5050","6069"),
  type3 = c("3901", "5001", "8901","5050","6069"),
  type4 = c("4001", "3901", "7901","5050","6069"),
  `51` = c("M", "A", "M", "M", "A"),
  `22` = c("D", "D", "K", "A", "D"),
  `13` = c("E", "G", "E", "E", "G"),
  `55` = c("M", "A", "M", "M", "A"),
  `25` = c("D", "D", "K", "A", "D"),
  `16` = c("E", "G", "E", "E", "G"), check.names = FALSE
)

so I want to get a score in the second data frame(df), the way I want to apply it is by using condition data frame rows to loop over each row of data frame. The condition is if the type is not present in the columns of df (type1:type4), then search for the column that has the name == position (in condition_df), and count the numbers of the letters in that column that only match subtype in the condition_df. The final score should be the sum of all the conditions of condition_df for each ID in df. Example ID 1 score is only 1 as we got 0 for the first and the last conditions (because it has the type so we ignore the subtype) and got 1 as it has the second subtype "D" and don't have its type. Any help is appreciated.

I tried something like this but didnot work:

for(i in seq_len(nrow(condition_df))) {
  for(j in seq_len(nrow(df))) {
    type <- condition_df$type[i]
    position <- as.character(condition_df$position[i])
    subtype <- condition_df$subtype[i]
    
    df<-df %>% 
      mutate(cir_score=case_when(!any(across(starts_with("type"))==type)~sum(get(as.character(position))==subtype),
                                 
                                 TRUE~0
      ))
      
  }
} 

Solution

  • Pivot to long form on the position columns (51, 22, ...) and then join with the condition data frame. Then filter out the rows where the type matches types 1-4, then the score is the number of records left for each id.

    full_join(df,
          pivot_longer(df, -c(id, starts_with("type")),
                       names_to="position", values_to="subtype",
                       names_transform=list(position=as.integer)) |>
           inner_join(condition_df, by=c("position", "subtype"), relationship = "many-to-many") |> 
           filter(!if_any(matches("type\\d+"), ~ .x==type)) |>
           distinct(id, position, subtype) |>
           count(id, name="score"),
              by="id") |>
      mutate(score=replace_na(score, 0))
    

      id type1 type2 type3 type4 51 22 13 55 25 16 score
    1  1  3701  3801  3901  4001  M  D  E  M  D  E     1
    2  2  4901  4901  5001  3901  A  D  G  A  D  G     0
    3  3  4902  4901  8901  7901  M  K  E  M  K  E     2
    4  4  4903  5050  5050  5050  M  A  E  M  A  E     2
    5  5  1501  6069  6069  6069  A  D  G  A  D  G     1