Search code examples
rfunctiontidyversemasking

variable masking in tidyverse programming in R


it is hard to reuse code in functions. I subset .data with [[ however received splicing error. i provide an example and the a solution using an "if" statement within a tidy function below. Is it possible use variable masking in tidyverse programming?

data frame

set.seed(123)
(df=data.frame(
  Yrs_Before=sample(1:8, 3),
  Yrs_After=sample(1:8, 3),
  Before.Yr_1=sample(1:8, 3),
  Before.Yr_2=sample(1:8, 3),
  Before.Yr_3=sample(1:8, 3),
  Before.Yr_4=sample(1:8, 3),
  Before.Yr_5=sample(1:8, 3),
  Before.Yr_6=sample(1:8, 3),
  Before.Yr_7=sample(1:8, 3),
  Before.Yr_8=sample(1:8, 3),
  After.Yr_1=sample(1:8, 3),
  After.Yr_2=sample(1:8, 3),
  After.Yr_3=sample(1:8, 3),
  After.Yr_4=sample(1:8, 3),
  After.Yr_5=sample(1:8, 3),
  After.Yr_6=sample(1:8, 3),
  After.Yr_7=sample(1:8, 3),
  After.Yr_8=sample(1:8, 3)
  
))

is it possible to use variable masking for the following function?

sums=function(data,crashes,yrs){
  data %>%
    dplyr::rowwise() %>%
    dplyr::transmute(sum = cumsum(c_across(matches(.data[[crashes]])))[.data[[yrs]]])
}

however an error was recieved.


sums(df,"After.Yr")
Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named,  : 
                           argument "yrs" is missing, with no default
                         Called from: splice(dot_call(capture_dots, frame_env = frame_env, named = named, 
                                                      ignore_empty = ignore_empty, unquote_names = unquote_names, 
                                                      homonyms = homonyms, check_assign = check_assign))

similarly with for counts occuring during the respective "before year" periods (eg. "Before.Yr.").

sums(df,"Before.Yr")
Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named,  : 
  argument "yrs" is missing, with no default
Called from: splice(dot_call(capture_dots, frame_env = frame_env, named = named, 
    ignore_empty = ignore_empty, unquote_names = unquote_names, 
    homonyms = homonyms, check_assign = check_assign))

the following was accomplished using an "if" statement, which provides the desired results. The desired results are provided below for the "before" (Before.Yr) and "after"(After.Yr) periods


sums = function(data,counts){
  data %>%
    dplyr::rowwise() %>%
    dplyr::transmute(sums = if(counts=="Before.Yr") {cumsum(c_across(matches('Before.Yr')))[Yrs_Before]} else{cumsum(c_across(matches('After.Yr')))[Yrs_After]})}


using crashes in the after period.

sums(df,"After.Yr")
# A tibble: 3 × 1
# Rowwise: 
sums
<int>
  1    21
2    20
3     6

using crashes in the before period.

> sums(df,"Before.Yr")
# A tibble: 3 × 1
# Rowwise: 
sums
<int>
  1    23
2    33
3    11

Solution

  • Instead of using matches(.data[[crashes]]) simply do matches(crashes) and of course do you have to pass a column name for yrs:

    library(dplyr)
    
    sums <- function(data, crashes, yrs) {
      data %>%
        dplyr::rowwise() %>%
        dplyr::transmute(sum = cumsum(c_across(matches(crashes)))[.data[[yrs]]])
    }
    
    sums(df, "After.Yr", "Yrs_After")
    #> # A tibble: 3 × 1
    #> # Rowwise: 
    #>     sum
    #>   <int>
    #> 1    21
    #> 2    20
    #> 3     6
    
    
    sums(df, "Before.Yr", "Yrs_Before")
    #> # A tibble: 3 × 1
    #> # Rowwise: 
    #>     sum
    #>   <int>
    #> 1    23
    #> 2    33
    #> 3    11