Search code examples
rif-statementconditional-statementssubset

How to check different conditions in a data frame based on the values in another data frame?


UPDATED:

I have two data frames (df1 and df2) as you can see in the tables below. Based on the conditions in df1, I need to check the values in df2. The expected output is also presented.
The condition is based on the columns dep and value in df1 checked in df2 against the the variables taken from the column var in df1. Please see the following examples:

For instance, based on row 1 in df1, we judge whether the values in column A in df2 are TRUE or FALSE.

  1. if E == 1, A == TRUE

  2. if E != 1, we need to check the following conditions:

    - if A == NA, A == TRUE

    - if A == any values other than NA, A == FALSE

  3. If A and E are both NAs, A == TRUE

similarly, for row 3 in df1 it would be:

  1. if A != 0, C == TRUE

  2. if A == 0, we need to check the value of C based on the following conditions:

    - if C == NA, C == TRUE

    - if C == any values other than NA, C == FALSE

  3. If A and C are both NAs, C == TRUE

Besides, in df1, when value == NA, we do not need to do anything (these rows can be removed).

Importantly, in my real df2, the variables belong to various classes and are not just integer/numeric.

I am all eyes to read your advice.

df1:

| var  |value| dep  |
|---   |---- | ---- |
| A    |==1  |  E   |
| B    |==1  |  E   |
| C    |!=0  |  A   |
| D    |==2  |  G   |
| E    |NA   |  NA  |
| F    |NA   |  NA  |
| G    |NA   |  NA  |

df2:

| ID  | B  | G  | A   |C   |D  |E  |
|---  |--- | ---| --  |--- |---|---|
| 1q  | 1  | NA | 0   | NA | NA| 2 |
| 2d  | 1  |  3 | 0   | 1  | 1 | 2 |
| 4f  | NA | NA | 1   | 1  | 1 | 1 |
| 3g  | 1  | 2  | 1   | NA | 1 | NA|
| 8j  | 2  | 2  | 1   | NA | 1 | NA|
| 5g  | NA | NA | NA  | 1  | 3 | 3 |
| 9l  | 1  | NA | NA  | 1  | 2 |NA |

Desired output would be:

| ID  | B  |  A | C |D  |
|---  |--- | ---|---|---|
| 1q  | F  | F  | T | T |
| 2d  | F  | F  | F | F | 
| 4f  | T  | T  | T | F | 
| 3g  | F  | F  | T | T | 
| 8j  | F  | F  | T | T | 
| 5g  | T  | T  | T | F | 
| 9l  | T  | T  | T | F |

Solution

  • Here F is character variable, and G is modified based on it

    library(tidyverse)
    library(rlang)
    library(glue)
    rules_1 <- tibble::tribble(
      ~var, ~value,    ~dep,
      "A",   "==1",    "E", 
      "B",   "==1",    "E", 
      "C",   "!=0",    "A", 
      "D",   "==2",    "G", 
      "E",      NA,     NA, 
      "F",      NA,     NA, 
      "G",    "%in% c('b','d')",     "F",
    )
    
    df2 <- data.frame(
      stringsAsFactors = FALSE,
      ID = c("1q", "2d", "4f", "3g", "8j", "5g", "9l"),
      B = c(1L, 1L, NA, 1L, 2L, NA, 1L),
      G = c(3L, 3L, NA, 2L, 2L, NA, NA),
      A = c(0L, 0L, 1L, 1L, 1L, NA, NA),
      C = c(NA, 1L, 1L, NA, NA, 1L, 1L),
      D = c(NA, 1L, 1L, 1L, 1L, 3L, 2L),
      E = c(2L, 2L, 1L, NA, NA, 3L, 1L),
      F = letters[1:7]
    )
    
    # And for variables that have NA values in df1, we do not need to do anything. 
    
    (rules_2 <- filter(rules_1,
                       !is.na(dep)))
    
    # rules from data
    (rules_3 <- mutate(rules_2,
                       rule = glue("case_when({dep}{value}~TRUE,is.na({var})~TRUE,TRUE ~ FALSE)")))
    
    (mutators <- rules_3$rule)
    names(mutators) <- rules_3$var
    
    (parsed_mutators <- rlang::parse_exprs(mutators))
    
    mutate(df2,
           !!!parsed_mutators)