Search code examples
rrowssubtraction

Substracting all columns by the values in two rows


What code should I use if I want to take all the columns in Row 2 and substract them by the values in Row 1. I'd like to do this down the entire dataset. That is, row 4 - row 3, row 6 - row 5, and so on.


Solution

  • Data

    > mtcars[1:3, ]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
    

    Using base R

    as.data.frame(lapply(mtcars[1:3, ], function(x) c(NA, diff(x))))
    
      mpg cyl disp  hp  drat     wt qsec vs am gear carb
    1  NA  NA   NA  NA    NA     NA   NA NA NA   NA   NA
    2 0.0   0    0   0  0.00  0.255 0.56  0  0    0    0
    3 1.8  -2  -52 -17 -0.05 -0.555 1.59  1  0    0   -3
    

    With dplyr:

    library(dplyr)

    Option 1

    > mutate(mtcars[1:3, ], across(everything(), ~. - lag(.)))
      mpg cyl disp  hp  drat     wt qsec vs am gear carb
    1  NA  NA   NA  NA    NA     NA   NA NA NA   NA   NA
    2 0.0   0    0   0  0.00  0.255 0.56  0  0    0    0
    3 1.8  -2  -52 -17 -0.05 -0.555 1.59  1  0    0   -3
    

    Option 2

    > mutate_all(mtcars[1:3, ], ~. - lag(.))
      mpg cyl disp  hp  drat     wt qsec vs am gear carb
    1  NA  NA   NA  NA    NA     NA   NA NA NA   NA   NA
    2 0.0   0    0   0  0.00  0.255 0.56  0  0    0    0
    3 1.8  -2  -52 -17 -0.05 -0.555 1.59  1  0    0   -3
    

    In this code, ~. - lag(.) is an anonymous function just like function(x) x - lag(x).

    In case your data contains non numeric variables, to do this without errors you can use mutate_if(your_data, is.numeric, ~. - lag(.)) or a variant with the new approach across().