Search code examples
rmultiple-columns

Difference of second and following column


I'm fairly new to R and trying to write a function to create a new dataframe that depicts the difference of the second and every following column of the original dataset. Imagine this was may data (although I have many variables)

obs  var1   var2   var3    
1     5      10     14   
2     6      11     15   
3     7      12     16   
4     8      13     17    

The output should look something like this

obs var2_1 var3_1
1     -5       -9
2     -5       -9
3     -5       -9
4     -5       -9

Thank you very much in advance!


Solution

  • You can use across() to apply the same transformation to multiple columns:

    library(tidyverse)
    
    df <- tibble::tribble(
      ~obs, ~var1, ~var2, ~var3,
      1,  5,   10,   14,
      2,  6,   11,   15,
      3,  7,   12,   16,
      4,  8,   13,   17
      )
    
    mutate(df, across(starts_with("var") & !var1, ~var1 - .x))
    #> # A tibble: 4 × 4
    #>     obs  var1  var2  var3
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1     1     5    -5    -9
    #> 2     2     6    -5    -9
    #> 3     3     7    -5    -9
    #> 4     4     8    -5    -9
    

    Created on 2023-03-20 with reprex v2.0.2

    Add the option .keep = "unused" to mutate() to remove var1 from the output.

    Update: To refer to columns based on their position, use

    mutate(df, across(!1:2, ~ pull(pick(2), 1) - .x))