In my dataset i have individuals with sometimes 2 timepoints, and i would like to set the difference between the 2 timepoints. The thing is that i have a lot of columns which i would like to apply this on. In my dataset i have or 1 or 2 rows per individuals, here is an example of what i have done with 2 colums :
library(tidyr)
data = data.frame(id = c(1,1,2,2,3,4,5,5,6),
time = c("M0","M3","M0","M3","M0","M0","M0","M3","M0"),
bio1 = c(4.2, 4.8, 4, NA, 3.8, 4.4, 5, 6, 6.1),
bio2 = c(12, 14, 10, 11, NA, 18, 19, 12, 15))
data
data_wide <- data %>%
pivot_wider(names_from="time",
values_from=c("bio1","bio2")) %>%
mutate(diff_bio1 = bio1_M3 - bio1_M0,
diff_bio2 = bio2_M3 - bio2_M0) %>%
select(id, diff_bio1, diff_bio2)
data_wide
I know with pivot_wider we can aggregate function on every seperated columns we create, but i didn't find a way to apply the function directly at the creation to set the difference. Is there a way to do that ?
At least using Tidyverse functions, a repeated operation across pairs of columns can be trickier than a solution that doesn't reshape the data wider:
library(tidyverse)
df = tibble(id = c(1,1,2,2,3,4,5,5,6),
time = c("M0","M3","M0","M3","M0","M0","M0","M3","M0"),
bio1 = c(4.2, 4.8, 4, NA, 3.8, 4.4, 5, 6, 6.1),
bio2 = c(12, 14, 10, 11, NA, 18, 19, 12, 15))
df |>
complete(id, time) |>
arrange(id, time) |>
summarize(across(starts_with("bio"), diff),
.by = id)
#> # A tibble: 6 × 3
#> id bio1 bio2
#> <dbl> <dbl> <dbl>
#> 1 1 0.600 2
#> 2 2 NA 1
#> 3 3 NA NA
#> 4 4 NA NA
#> 5 5 1 -7
#> 6 6 NA NA
Created on 2023-12-06 with reprex v2.0.2