This feels like a basic question for I can't seem to figure out how to do it. I want to insert a new row in my data frame after performing a calculation, and then loop that procedure (calculation & inserting row with output) for each participant I have.
My data frame looks something like this (with 9 subjects total), where I have 8 pre-post treatment outcomes per subject:
| subID | treatment | outcome |
| ----- | ---------------------------- | ------- |
| 14325 | affected_pre_1 | 0.45 |
| 14325 | affected_pre_2 | 0.25 |
| 14325 | unaffected_pre_1 | 1.05 |
| 14325 | unaffected_pre_2 | 1.23 |
| 14325 | affected_post_1 | 0.22 |
| 14325 | affected_post_2 | 0.45 |
| 14325 | unaffected_post_1 | 0.87 |
| 14325 | unaffected_post_2 | 0.34 |
| 18964 | affected_pre_1 | 0.90 |
| 18964 | affected_pre_2 | 1.13 |
| 18964 | unaffected_pre_1 | 0.55 |
| 18964 | unaffected_pre_2 | 0.23 |
| 18964 | affected_post_1 | 0.17 |
| 18964 | affected_post_2 | 1.22 |
| 18964 | unaffected_post_1 | 1.89 |
| 18964 | unaffected_post_2 | 0.76 |
| 39274 | affected_pre_1 | 0.88 |
| 39274 | affected_pre_2 | 0.12 |
| 39274 | unaffected_pre_1 | 0.05 |
| 39274 | unaffected_pre_2 | 0.34 |
| 39274 | affected_post_1 | 0.99 |
| 39274 | affected_post_2 | 0.32 |
| 39274 | unaffected_post_1 | 1.81 |
| 39274 | unaffected_post_2 | 1.34 |
| ... | ... | ... |
I want to create 4 new rows for each subject with the following calculation (essentially normalizing some of the measures):
row 1 = affected_pre_2 / affected_pre_1
row 2 = unaffected_pre_2 / unaffected_pre_1
row 3 = affected_post_2 / affected_post_1
row 4 = unaffected_post_2 / unaffected_post_1
Based on a previous answer, I've tried the following:
data %>%
bind_rows(data %>% summarise(subID = str_c(subID, lead(subID), sep = '_')) %>% na.omit() %>%
bind_cols(data %>% summarise(across(5:11, ~ . - lead(.))) %>% na.omit()))
The across(5:11 ...) refers to the multiple outcome variables I have (in my example dataset, I just included one outcome variable for simplicity).
However, i got a warming that summarise() was deprecated - is there a better way to do this?
I think I might also need to use the apply function somewhere in there, but I'm not sure how to loop everything through each subject then. Any help would be greatly appreciated!
I think a pivot/mutate/pivot should work well here:
library(tidyr) # pivot_*
pivot_wider(quux, subID, names_from = treatment, values_from = outcome) %>%
transmute(
subID,
row1 = affected_pre_2 / affected_pre_1,
row2 = unaffected_pre_2 / unaffected_pre_1,
row3 = affected_post_2 / affected_post_1,
row4 = unaffected_post_2 / unaffected_post_1
) %>%
pivot_longer(-subID, names_to = "treatment", values_to = "outcome")
# # A tibble: 12 × 3
# subID treatment outcome
# <int> <chr> <dbl>
# 1 14325 row1 0.556
# 2 14325 row2 1.17
# 3 14325 row3 2.05
# 4 14325 row4 0.391
# 5 18964 row1 1.26
# 6 18964 row2 0.418
# 7 18964 row3 7.18
# 8 18964 row4 0.402
# 9 39274 row1 0.136
# 10 39274 row2 6.8
# 11 39274 row3 0.323
# 12 39274 row4 0.740