inorder to do a paired analysis i need to write a function that sums integer counts. The total number required to be summed is specified in the corresponding "Yrs_Before" and "Yrs_After" columns in "df". Is there a way i can avoid writing "fm_after" with just one function? Is this a variable masking for "Yrs_Before", "Yrs_After","Before.Yr_1...n+1" and "After.Yr_1...n+1" columns?
data frame
set.seed(123)
(df=data.frame(
Yrs_Before=sample(1:8, 3),
Yrs_After=sample(1:8, 3),
Before.Yr_1=sample(1:8, 3),
Before.Yr_2=sample(1:8, 3),
Before.Yr_3=sample(1:8, 3),
Before.Yr_4=sample(1:8, 3),
Before.Yr_5=sample(1:8, 3),
Before.Yr_6=sample(1:8, 3),
Before.Yr_7=sample(1:8, 3),
Before.Yr_8=sample(1:8, 3),
After.Yr_1=sample(1:8, 3),
After.Yr_2=sample(1:8, 3),
After.Yr_3=sample(1:8, 3),
After.Yr_4=sample(1:8, 3),
After.Yr_5=sample(1:8, 3),
After.Yr_6=sample(1:8, 3),
After.Yr_7=sample(1:8, 3),
After.Yr_8=sample(1:8, 3)
))
function sums the corresponding rows based on the number of years in the before period.
fm=function(data,Yrs_Before){
data |> dplyr::mutate(sums=
ifelse(
Yrs_Before == 1, rowSums(across(Before.Yr_1)),
ifelse(
Yrs_Before == 2, rowSums(across(Before.Yr_1:Before.Yr_2)),
ifelse(
Yrs_Before == 3, rowSums(across(Before.Yr_1:Before.Yr_3)),
ifelse(
Yrs_Before == 4, rowSums(across(Before.Yr_1:Before.Yr_4)),
ifelse(
Yrs_Before == 5, rowSums(across(Before.Yr_1:Before.Yr_5)),
ifelse(
Yrs_Before == 6, rowSums(across(Before.Yr_1:Before.Yr_6)),
ifelse(
Yrs_Before == 7, rowSums(across(Before.Yr_1:Before.Yr_7)),
ifelse(
Yrs_Before == 8, rowSums(across(Before.Yr_1:Before.Yr_8)),"")))))))))
}
output
fm(df,Yrs_Before)
Yrs_Before Yrs_After Before.Yr_1 Before.Yr_2 Before.Yr_3 Before.Yr_4 Before.Yr_5 Before.Yr_6 Before.Yr_7 Before.Yr_8 After.Yr_1 After.Yr_2 After.Yr_3 After.Yr_4 After.Yr_5 After.Yr_6 After.Yr_7 After.Yr_8 sums
1 7 6 2 5 6 3 3 1 3 1 4 3 4 5 1 4 3 2 23
2 8 3 6 4 1 5 1 8 2 6 6 7 7 7 2 5 6 5 33
3 3 2 3 6 2 8 4 5 7 3 1 5 2 1 3 7 1 7 11
duplicated function with variable name changed to "After"
fm_after=function(data,Yrs_After){
data |> dplyr::mutate(sums=
ifelse(
Yrs_After == 1, rowSums(across(After.Yr_1)),
ifelse(
Yrs_After == 2, rowSums(across(After.Yr_1:After.Yr_2)),
ifelse(
Yrs_After == 3, rowSums(across(After.Yr_1:After.Yr_3)),
ifelse(
Yrs_After == 4, rowSums(across(After.Yr_1:After.Yr_4)),
ifelse(
Yrs_After == 5, rowSums(across(After.Yr_1:After.Yr_5)),
ifelse(
Yrs_After == 6, rowSums(across(After.Yr_1:After.Yr_6)),
ifelse(
Yrs_After == 7, rowSums(across(After.Yr_1:After.Yr_7)),
ifelse(
Yrs_After == 8, rowSums(across(After.Yr_1:After.Yr_8)),"")))))))))
}
output
fm_after(df,Yrs_After)
Yrs_Before Yrs_After Before.Yr_1 Before.Yr_2 Before.Yr_3 Before.Yr_4 Before.Yr_5 Before.Yr_6 Before.Yr_7 Before.Yr_8 After.Yr_1 After.Yr_2 After.Yr_3 After.Yr_4 After.Yr_5 After.Yr_6 After.Yr_7 After.Yr_8 sums
1 7 6 2 5 6 3 3 1 3 1 4 3 4 5 1 4 3 2 21
2 8 3 6 4 1 5 1 8 2 6 6 7 7 7 2 5 6 5 20
3 3 2 3 6 2 8 4 5 7 3 1 5 2 1 3 7 1 7 6
UPDATE:
As recommended, the following tidyverse function was used for counting Before.Yr and After.Yr crash counts.
sums=function(data,crashes,yrs){
data %>%
dplyr::rowwise() %>%
dplyr::transmute(sum = cumsum(c_across(matches(.data[[crashes]])))[.data[[yrs]]])
}
however an error was returned.
> sums(df,"Before.Yr","Yrs_Before")
Error: Problem with `mutate()` column `sum`.
ℹ `sum = cumsum(c_across(matches(.data[["Before.Yr"]])))[.data[["Yrs_Before"]]]`.
x Column `Before.Yr` not found in `.data`
ℹ The error occurred in row 1.
Run `rlang::last_error()` to see where the error occurred.
suggestions for fixing the error?
EDIT:
in base R you could do:
A <- split.default(df, sub('.*(After|Before).*', '\\1', names(df)))
sapply(A, \(x) rowSums(x[-1] * (col(x[-1]) <= x[[1]])))
After Before
[1,] 21 23
[2,] 20 33
[3,] 6 11
You can then cbind
this to the original dataframe
tidyverse:
use mutate
instead of transmute
in order to retain the original dataframe.
df %>%
rowwise() %>%
transmute(After = cumsum(c_across(matches('Before.Yr')))[Yrs_Before],
Before = cumsum(c_across(matches('After.Yr')))[Yrs_After])
# A tibble: 3 x 2
# Rowwise:
After Before
<int> <int>
1 23 21
2 33 20
3 11 6