Search code examples
rdataframetidyverseapply

Insert new row following calculation, and loop it for multiple subjects


This feels like a basic question for I can't seem to figure out how to do it. I want to insert a new row in my data frame after performing a calculation, and then loop that procedure (calculation & inserting row with output) for each participant I have.

My data frame looks something like this (with 9 subjects total), where I have 8 pre-post treatment outcomes per subject:

| subID | treatment                    | outcome |
| ----- | ---------------------------- | ------- |
| 14325 | affected_pre_1               | 0.45    |
| 14325 | affected_pre_2               | 0.25    |
| 14325 | unaffected_pre_1             | 1.05    |
| 14325 | unaffected_pre_2             | 1.23    |
| 14325 | affected_post_1              | 0.22    |
| 14325 | affected_post_2              | 0.45    |
| 14325 | unaffected_post_1            | 0.87    |
| 14325 | unaffected_post_2            | 0.34    |
| 18964 | affected_pre_1               | 0.90    |
| 18964 | affected_pre_2               | 1.13    |
| 18964 | unaffected_pre_1             | 0.55    |
| 18964 | unaffected_pre_2             | 0.23    |
| 18964 | affected_post_1              | 0.17    |
| 18964 | affected_post_2              | 1.22    |
| 18964 | unaffected_post_1            | 1.89    |
| 18964 | unaffected_post_2            | 0.76    |
| 39274 | affected_pre_1               | 0.88    |
| 39274 | affected_pre_2               | 0.12    |
| 39274 | unaffected_pre_1             | 0.05    |
| 39274 | unaffected_pre_2             | 0.34    |
| 39274 | affected_post_1              | 0.99    |
| 39274 | affected_post_2              | 0.32    |
| 39274 | unaffected_post_1            | 1.81    |
| 39274 | unaffected_post_2            | 1.34    |
|  ...  | ...                          | ...     |

I want to create 4 new rows for each subject with the following calculation (essentially normalizing some of the measures):

row 1 = affected_pre_2 / affected_pre_1
row 2 = unaffected_pre_2 / unaffected_pre_1
row 3 = affected_post_2 / affected_post_1
row 4 = unaffected_post_2 / unaffected_post_1

Based on a previous answer, I've tried the following:

data %>%
bind_rows(data %>% summarise(subID = str_c(subID, lead(subID), sep = '_')) %>% na.omit() %>%
bind_cols(data %>% summarise(across(5:11, ~ . - lead(.))) %>% na.omit()))

The across(5:11 ...) refers to the multiple outcome variables I have (in my example dataset, I just included one outcome variable for simplicity).

However, i got a warming that summarise() was deprecated - is there a better way to do this?

I think I might also need to use the apply function somewhere in there, but I'm not sure how to loop everything through each subject then. Any help would be greatly appreciated!


Solution

  • I think a pivot/mutate/pivot should work well here:

    library(tidyr) # pivot_*
    pivot_wider(quux, subID, names_from = treatment, values_from = outcome) %>%
      transmute(
        subID, 
        row1 = affected_pre_2 / affected_pre_1, 
        row2 = unaffected_pre_2 / unaffected_pre_1, 
        row3 = affected_post_2 / affected_post_1, 
        row4 = unaffected_post_2 / unaffected_post_1
      ) %>%
      pivot_longer(-subID, names_to = "treatment", values_to = "outcome")
    # # A tibble: 12 × 3
    #    subID treatment outcome
    #    <int> <chr>       <dbl>
    #  1 14325 row1        0.556
    #  2 14325 row2        1.17 
    #  3 14325 row3        2.05 
    #  4 14325 row4        0.391
    #  5 18964 row1        1.26 
    #  6 18964 row2        0.418
    #  7 18964 row3        7.18 
    #  8 18964 row4        0.402
    #  9 39274 row1        0.136
    # 10 39274 row2        6.8  
    # 11 39274 row3        0.323
    # 12 39274 row4        0.740