I have the following dataframe consisting of different variables (A, X, L) that were measured at two different times (t1 and t2). Of course, in the real data there are many more variables (just for the sake of the example)
n <- 100
df <- data.frame(A_t1 = runif(n, min = 1, max = 5),
A_t2 = runif(n, min = 1, max = 5),
X_t1 = runif(n, min = 1, max = 5),
X_t2 = runif(n, min = 1, max = 5),
L_t1 = runif(n, min = 1, max = 5),
L_t2 = runif(n, min = 1, max = 5)
) %>% pivot_longer(cols = starts_with(c("A", "X", "L")))
> df
# A tibble: 600 × 2
name value
<chr> <dbl>
1 A_t1 4.15
2 A_t2 2.20
3 X_t1 4.49
4 X_t2 3.84
5 L_t1 1.94
6 L_t2 1.75
7 A_t1 3.16
8 A_t2 1.71
9 X_t1 2.20
10 X_t2 4.47
What's the easiest way to compute the difference (A_delta, X_delta, L_delta) between t2 and t1 for each variable? (It can be either a long or in a wide format, but not absolute differences). I have been thinking about using group_by()
and then lag()
, but this seems not to be a very elegant solution.
Thanks for help!
Here is one way without pivoting:
library(dplyr)
library(stringr)
df %>%
mutate(across(ends_with('_t2'), ~ . -
get(str_replace(cur_column(), "t2$", "t1")), .names = "diff_{.col}")) %>%
rename_at(vars(starts_with('diff')), ~ str_remove(., "\\_t2"))
A_t1 A_t2 X_t1 X_t2 L_t1 L_t2 diff_A diff_X diff_L
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.87 3.07 2.81 2.00 4.25 2.92 1.20 -0.809 -1.33
2 1.67 3.33 4.93 4.70 1.59 3.06 1.66 -0.225 1.47
3 2.51 2.85 2.88 1.12 4.27 2.63 0.338 -1.75 -1.64
4 2.92 3.48 3.04 2.85 2.14 3.66 0.565 -0.189 1.53
5 1.58 4.76 3.82 2.55 3.52 3.63 3.18 -1.27 0.104
6 3.49 3.95 3.58 4.32 1.65 3.67 0.458 0.738 2.02
7 1.10 2.67 3.91 4.96 3.08 2.08 1.57 1.06 -0.999
8 3.58 1.33 1.55 2.25 4.38 4.47 -2.26 0.704 0.0950
9 3.05 4.37 3.39 1.71 2.98 4.69 1.32 -1.68 1.71
10 1.17 3.83 4.92 2.15 1.74 4.00 2.67 -2.77 2.26
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows