Search code examples
rfunctiontidyverse

Creating a function to modify a list of variables conditionally using the case_when function in R


I would like to create several (30+) variables based on the following function where N, B, C, Q are each a list of variables within the df dataframe. I've included the code as well as the error from R I receive. An example of the data is provided where "..." represents the rest of the data besides which contain 0, 1, or missing.

    record_id timeframe                ...
95          2         0                  0
94          2         1                 NA
19          6         0                  1
17          6         1                 NA
18          6         2                 NA
75          9         0                  1
73          9         1                  0
74          9         2                 NA



lag_vars <- function(df, N, B, C, Q){
  df <- df %>% group_by(record_id) %>%
    mutate(N = case_when(
      timeframe == 0 ~ B,
      timeframe > 0 & C == 1 ~ Q,
      timeframe > 0 & C == 0 ~ lag(N)
    ))

  return(df)
}

lag_vars(t, Nt, Bt, Ct, Qt)

Which returns an error:

Error in `mutate()`:
! Problem while computing `N = case_when(...)`.
ℹ The error occurred in group 1: record_id = 2.
Caused by error in `case_when()`:
! `timeframe == 0 ~ B`, `timeframe > 0 & C == 1 ~ Q`, `timeframe > 0 & C == 0 ~ lag(N)` must be length 2 or one, not 3.
Run `rlang::last_trace()` to see where the error occurred.
Called from: signal_abort(cnd, .file)
Warning messages:
1: Problem while computing `N = case_when(...)`.
ℹ longer object length is not a multiple of shorter object length
ℹ The warning occurred in group 1: record_id = 2. 
2: Problem while computing `N = case_when(...)`.
ℹ longer object length is not a multiple of shorter object length
ℹ The warning occurred in group 1: record_id = 2.

Is case_when able to utilize vectors? Or can I place case_when within another function?


Solution

  • I have changed the function to avoid hard-coding record_id and timeframe. Moreover, I provided an extra argument, Nt2, to showcase the changes that were applied to column Nt. Basically, you need to use !!enquo(colname) or {{colname}} to pass the arguments to the function.

    library(dplyr)
    library(rlang)
    
    
    lag_vars <- function(df, id, time, N, B, C, Q, Nt2){
      out <- df %>% group_by(!! enquo(id)) %>%
        mutate(!! enquo(Nt2) := case_when(
                          !! enquo(time) == 0 ~ !! enquo(B),
                          !! enquo(time) > 0 & !! enquo(C) == 1 ~ !! enquo(Q),
                          !! enquo(time) > 0 & !! enquo(C) == 0 ~ lag(!! enquo(N))
        ))
      
      return(out)
    }
    
    lag_vars(df1, record_id, timeframe, Nt, Bt, Ct, Qt, Ntest)
    
    #> # A tibble: 8 × 7
    #> # Groups:   record_id [3]
    #>   record_id timeframe    Nt    Bt    Ct    Qt Ntest
    #>       <int>     <int> <int> <int> <int> <int> <int>
    #> 1         2         0     0     1     0     0     1
    #> 2         2         1    NA     0    NA    NA    NA
    #> 3         6         0     1    NA     1     1    NA
    #> 4         6         1    NA    NA     0    NA     1
    #> 5         6         2    NA    NA     1    NA    NA
    #> 6         9         0     1     0    NA     1     0
    #> 7         9         1     0     1     1     0     0
    #> 8         9         2    NA    NA    NA    NA    NA
    

    Data:

    read.table(text = "record_id timeframe Nt Bt Ct Qt
                               2         0  0  1  0  0
                               2         1 NA  0 NA NA
                               6         0  1 NA  1  1
                               6         1 NA NA  0 NA
                               6         2 NA NA  1 NA
                               9         0  1  0 NA  1
                               9         1  0  1  1  0
                               9         2 NA NA NA NA",
               header = T, stringsAsFactors = F) -> df1
    

    Created on 2024-04-02 with reprex v2.0.2