Search code examples
rtidyrplyr

Calculate pairwise variable-differences in a data.frame/tibble


I have the following dataframe consisting of different variables (A, X, L) that were measured at two different times (t1 and t2). Of course, in the real data there are many more variables (just for the sake of the example)

n <- 100

df <- data.frame(A_t1 = runif(n, min = 1, max = 5),
                 A_t2 = runif(n, min = 1, max = 5),
                 X_t1 = runif(n, min = 1, max = 5),
                 X_t2 = runif(n, min = 1, max = 5),
                 L_t1 = runif(n, min = 1, max = 5),
                 L_t2 = runif(n, min = 1, max = 5)
) %>% pivot_longer(cols = starts_with(c("A", "X", "L")))

> df
# A tibble: 600 × 2
   name  value
   <chr> <dbl>
 1 A_t1   4.15
 2 A_t2   2.20
 3 X_t1   4.49
 4 X_t2   3.84
 5 L_t1   1.94
 6 L_t2   1.75
 7 A_t1   3.16
 8 A_t2   1.71
 9 X_t1   2.20
10 X_t2   4.47

What's the easiest way to compute the difference (A_delta, X_delta, L_delta) between t2 and t1 for each variable? (It can be either a long or in a wide format, but not absolute differences). I have been thinking about using group_by() and then lag(), but this seems not to be a very elegant solution.

Thanks for help!


Solution

  • Here is one way without pivoting:

    library(dplyr)
    library(stringr)
    df %>%
      mutate(across(ends_with('_t2'), ~ . -
                      get(str_replace(cur_column(), "t2$", "t1")), .names = "diff_{.col}")) %>%
      rename_at(vars(starts_with('diff')), ~ str_remove(., "\\_t2"))
    
    
        A_t1  A_t2  X_t1  X_t2  L_t1  L_t2 diff_A diff_X  diff_L
       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
     1  1.87  3.07  2.81  2.00  4.25  2.92  1.20  -0.809 -1.33  
     2  1.67  3.33  4.93  4.70  1.59  3.06  1.66  -0.225  1.47  
     3  2.51  2.85  2.88  1.12  4.27  2.63  0.338 -1.75  -1.64  
     4  2.92  3.48  3.04  2.85  2.14  3.66  0.565 -0.189  1.53  
     5  1.58  4.76  3.82  2.55  3.52  3.63  3.18  -1.27   0.104 
     6  3.49  3.95  3.58  4.32  1.65  3.67  0.458  0.738  2.02  
     7  1.10  2.67  3.91  4.96  3.08  2.08  1.57   1.06  -0.999 
     8  3.58  1.33  1.55  2.25  4.38  4.47 -2.26   0.704  0.0950
     9  3.05  4.37  3.39  1.71  2.98  4.69  1.32  -1.68   1.71  
    10  1.17  3.83  4.92  2.15  1.74  4.00  2.67  -2.77   2.26  
    # … with 90 more rows
    # ℹ Use `print(n = ...)` to see more rows