Search code examples
rgroup-bydiffrows

Calculate difference between multiple rows by a group in R


I have a data frame like this (with more observations and code variable than in this example):

  code  tmp     wek   sbd
   <chr> <chr> <dbl> <dbl>
 1 abc01 T1        1  7.83
 2 abc01 T1        1  7.83
 3 abc01 T1        2  8.5 
 4 abc01 T1        2  8.5 
 5 abc01 T1        1  7.83
 6 abc01 T1        1  7.83
 7 abc01 T1        1  7.83
 8 abc01 T1        1  7.83
 9 abc01 T1        1  7.83
10 abc01 T2        1  7.56
11 abc01 T2        1  7.56
12 abc01 T2        2  7.22
13 abc01 T2        2  7.22
14 abc01 T2        1  7.56
15 abc01 T2        1  7.56
16 abc01 T2        1  7.56
17 abc01 T2        1  7.56
18 abc01 T2        1  7.56

Now I want to calculate a new variable that gives the difference of variable sbd between wek = 1 and wek = 2 by code and tmp variable.

So far I just found functions that give me the difference of consecutive rows, but this does not fit in my case.


Solution

  • You can use match to get the corresponding sbd value at wk 1 and 2.

    library(dplyr)
    
    df %>%
      group_by(code, tmp) %>%
      summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
    
    #  code  tmp    diff
    #  <chr> <chr> <dbl>
    #1 abc01 T1    -0.67
    #2 abc01 T2     0.34
    

    If you want to add a new column in the dataframe keeping the rows same, use mutate instead of summarise.

    data

    It is easier to help if you provide data in a reproducible format

    df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01", 
    "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01", 
    "abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1", 
    "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2", 
    "T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83, 
    7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22, 
    7.22, 7.56, 7.56, 7.56, 7.56, 7.56)), 
    class = "data.frame", row.names = c(NA, -18L))