I have a data frame like this (with more observations and code variable than in this example):
code tmp wek sbd
<chr> <chr> <dbl> <dbl>
1 abc01 T1 1 7.83
2 abc01 T1 1 7.83
3 abc01 T1 2 8.5
4 abc01 T1 2 8.5
5 abc01 T1 1 7.83
6 abc01 T1 1 7.83
7 abc01 T1 1 7.83
8 abc01 T1 1 7.83
9 abc01 T1 1 7.83
10 abc01 T2 1 7.56
11 abc01 T2 1 7.56
12 abc01 T2 2 7.22
13 abc01 T2 2 7.22
14 abc01 T2 1 7.56
15 abc01 T2 1 7.56
16 abc01 T2 1 7.56
17 abc01 T2 1 7.56
18 abc01 T2 1 7.56
Now I want to calculate a new variable that gives the difference of variable sbd between wek = 1 and wek = 2 by code and tmp variable.
So far I just found functions that give me the difference of consecutive rows, but this does not fit in my case.
You can use match
to get the corresponding sbd
value at wk
1 and 2.
library(dplyr)
df %>%
group_by(code, tmp) %>%
summarise(diff = sbd[match(1, wek)] - sbd[match(2, wek)])
# code tmp diff
# <chr> <chr> <dbl>
#1 abc01 T1 -0.67
#2 abc01 T2 0.34
If you want to add a new column in the dataframe keeping the rows same, use mutate
instead of summarise
.
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(code = c("abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01", "abc01",
"abc01", "abc01", "abc01", "abc01", "abc01", "abc01"), tmp = c("T1",
"T1", "T1", "T1", "T1", "T1", "T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T2", "T2", "T2", "T2"), wek = c(1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), sbd = c(7.83,
7.83, 8.5, 8.5, 7.83, 7.83, 7.83, 7.83, 7.83, 7.56, 7.56, 7.22,
7.22, 7.56, 7.56, 7.56, 7.56, 7.56)),
class = "data.frame", row.names = c(NA, -18L))