Search code examples
rmasking

create a function using data masking in R


inorder to do a paired analysis i need to write a function that sums integer counts. The total number required to be summed is specified in the corresponding "Yrs_Before" and "Yrs_After" columns in "df". Is there a way i can avoid writing "fm_after" with just one function? Is this a variable masking for "Yrs_Before", "Yrs_After","Before.Yr_1...n+1" and "After.Yr_1...n+1" columns?

data frame

set.seed(123)
(df=data.frame(
  Yrs_Before=sample(1:8, 3),
  Yrs_After=sample(1:8, 3),
  Before.Yr_1=sample(1:8, 3),
  Before.Yr_2=sample(1:8, 3),
  Before.Yr_3=sample(1:8, 3),
  Before.Yr_4=sample(1:8, 3),
  Before.Yr_5=sample(1:8, 3),
  Before.Yr_6=sample(1:8, 3),
  Before.Yr_7=sample(1:8, 3),
  Before.Yr_8=sample(1:8, 3),
  After.Yr_1=sample(1:8, 3),
  After.Yr_2=sample(1:8, 3),
  After.Yr_3=sample(1:8, 3),
  After.Yr_4=sample(1:8, 3),
  After.Yr_5=sample(1:8, 3),
  After.Yr_6=sample(1:8, 3),
  After.Yr_7=sample(1:8, 3),
  After.Yr_8=sample(1:8, 3)
  
))

function sums the corresponding rows based on the number of years in the before period.

fm=function(data,Yrs_Before){
  data |> dplyr::mutate(sums=
  ifelse(
  Yrs_Before == 1, rowSums(across(Before.Yr_1)),
  ifelse(
    Yrs_Before == 2, rowSums(across(Before.Yr_1:Before.Yr_2)),
    ifelse(
      Yrs_Before == 3, rowSums(across(Before.Yr_1:Before.Yr_3)),
      ifelse(
        Yrs_Before == 4, rowSums(across(Before.Yr_1:Before.Yr_4)),
        ifelse(
          Yrs_Before == 5, rowSums(across(Before.Yr_1:Before.Yr_5)),
          ifelse(
            Yrs_Before == 6, rowSums(across(Before.Yr_1:Before.Yr_6)),
            ifelse(
              Yrs_Before == 7, rowSums(across(Before.Yr_1:Before.Yr_7)),
              ifelse(
                Yrs_Before == 8, rowSums(across(Before.Yr_1:Before.Yr_8)),"")))))))))
}

output


fm(df,Yrs_Before)
Yrs_Before Yrs_After Before.Yr_1 Before.Yr_2 Before.Yr_3 Before.Yr_4 Before.Yr_5 Before.Yr_6 Before.Yr_7 Before.Yr_8 After.Yr_1 After.Yr_2 After.Yr_3 After.Yr_4 After.Yr_5 After.Yr_6 After.Yr_7 After.Yr_8 sums
1          7         6           2           5           6           3           3           1           3           1          4          3          4          5          1          4          3          2   23
2          8         3           6           4           1           5           1           8           2           6          6          7          7          7          2          5          6          5   33
3          3         2           3           6           2           8           4           5           7           3          1          5          2          1          3          7          1          7   11

duplicated function with variable name changed to "After"


fm_after=function(data,Yrs_After){
  data |> dplyr::mutate(sums=
                   ifelse(
                     Yrs_After == 1, rowSums(across(After.Yr_1)),
                     ifelse(
                       Yrs_After == 2, rowSums(across(After.Yr_1:After.Yr_2)),
                       ifelse(
                         Yrs_After == 3, rowSums(across(After.Yr_1:After.Yr_3)),
                         ifelse(
                           Yrs_After == 4, rowSums(across(After.Yr_1:After.Yr_4)),
                           ifelse(
                             Yrs_After == 5, rowSums(across(After.Yr_1:After.Yr_5)),
                             ifelse(
                               Yrs_After == 6, rowSums(across(After.Yr_1:After.Yr_6)),
                               ifelse(
                                 Yrs_After == 7, rowSums(across(After.Yr_1:After.Yr_7)),
                                 ifelse(
                                   Yrs_After == 8, rowSums(across(After.Yr_1:After.Yr_8)),"")))))))))
}

output

fm_after(df,Yrs_After)
Yrs_Before Yrs_After Before.Yr_1 Before.Yr_2 Before.Yr_3 Before.Yr_4 Before.Yr_5 Before.Yr_6 Before.Yr_7 Before.Yr_8 After.Yr_1 After.Yr_2 After.Yr_3 After.Yr_4 After.Yr_5 After.Yr_6 After.Yr_7 After.Yr_8 sums
1          7         6           2           5           6           3           3           1           3           1          4          3          4          5          1          4          3          2   21
2          8         3           6           4           1           5           1           8           2           6          6          7          7          7          2          5          6          5   20
3          3         2           3           6           2           8           4           5           7           3          1          5          2          1          3          7          1          7    6

UPDATE:

As recommended, the following tidyverse function was used for counting Before.Yr and After.Yr crash counts.

sums=function(data,crashes,yrs){
data %>%
  dplyr::rowwise() %>%
    dplyr::transmute(sum = cumsum(c_across(matches(.data[[crashes]])))[.data[[yrs]]])
}

however an error was returned.

> sums(df,"Before.Yr","Yrs_Before")
Error: Problem with `mutate()` column `sum`.
ℹ `sum = cumsum(c_across(matches(.data[["Before.Yr"]])))[.data[["Yrs_Before"]]]`.
x Column `Before.Yr` not found in `.data`
ℹ The error occurred in row 1.
Run `rlang::last_error()` to see where the error occurred.

suggestions for fixing the error?


Solution

  • EDIT:

    in base R you could do:

    A <- split.default(df, sub('.*(After|Before).*', '\\1', names(df)))  
    sapply(A, \(x) rowSums(x[-1] * (col(x[-1]) <= x[[1]])))
        After Before
    [1,]    21     23
    [2,]    20     33
    [3,]     6     11
    

    You can then cbind this to the original dataframe

    tidyverse:

    use mutate instead of transmute in order to retain the original dataframe.

    df %>%
      rowwise() %>%
      transmute(After = cumsum(c_across(matches('Before.Yr')))[Yrs_Before],
                Before = cumsum(c_across(matches('After.Yr')))[Yrs_After]) 
    
    # A tibble: 3 x 2
    # Rowwise: 
      After Before
      <int>  <int>
    1    23     21
    2    33     20
    3    11      6