rpivottidyr

Aggregate the columns we create with pivot_wider (set a difference)


In my dataset i have individuals with sometimes 2 timepoints, and i would like to set the difference between the 2 timepoints. The thing is that i have a lot of columns which i would like to apply this on. In my dataset i have or 1 or 2 rows per individuals, here is an example of what i have done with 2 colums :

library(tidyr)
data = data.frame(id = c(1,1,2,2,3,4,5,5,6),
                  time = c("M0","M3","M0","M3","M0","M0","M0","M3","M0"),
                  bio1 = c(4.2, 4.8, 4, NA, 3.8, 4.4, 5, 6, 6.1),
                  bio2 = c(12, 14, 10, 11, NA, 18, 19, 12, 15))
data 

data_wide <- data %>% 
  pivot_wider(names_from="time",
              values_from=c("bio1","bio2")) %>%
  mutate(diff_bio1 = bio1_M3 - bio1_M0,
         diff_bio2 = bio2_M3 - bio2_M0) %>%
  select(id, diff_bio1, diff_bio2)
data_wide              

I know with pivot_wider we can aggregate function on every seperated columns we create, but i didn't find a way to apply the function directly at the creation to set the difference. Is there a way to do that ?


Solution

  • At least using Tidyverse functions, a repeated operation across pairs of columns can be trickier than a solution that doesn't reshape the data wider:

    library(tidyverse)
    
    df = tibble(id = c(1,1,2,2,3,4,5,5,6),
                      time = c("M0","M3","M0","M3","M0","M0","M0","M3","M0"),
                      bio1 = c(4.2, 4.8, 4, NA, 3.8, 4.4, 5, 6, 6.1),
                      bio2 = c(12, 14, 10, 11, NA, 18, 19, 12, 15))
    
    df |> 
      complete(id, time) |> 
      arrange(id, time) |> 
      summarize(across(starts_with("bio"), diff),
                .by = id)
    #> # A tibble: 6 × 3
    #>      id   bio1  bio2
    #>   <dbl>  <dbl> <dbl>
    #> 1     1  0.600     2
    #> 2     2 NA         1
    #> 3     3 NA        NA
    #> 4     4 NA        NA
    #> 5     5  1        -7
    #> 6     6 NA        NA
    

    Created on 2023-12-06 with reprex v2.0.2