I want to create a lag variable for a value that is nested in three groups:
For example:
df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
party = rep(c("A", "A", "A", "B", "B", "B"), 2),
inc = rep(c(1,2,3), 4),
value = c(1, 10, 100, 3, 30, 300, 6, 60, 600, 7, 70, 700))
Data:
wave party inc value
1 1 A 1 1
2 1 A 2 10
3 1 A 3 100
4 1 B 1 3
5 1 B 2 30
6 1 B 3 300
7 2 A 1 6
8 2 A 2 60
9 2 A 3 600
10 2 B 1 7
11 2 B 2 70
12 2 B 3 700
What I need is the following:
wave party inc value lag
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 1
8 2 A 2 60 10
9 2 A 3 600 100
10 2 B 1 7 3
11 2 B 2 70 30
12 2 B 3 700 300
Where a respondent of income group (inc) 1, of party A in wave 2 has the lagged value of inc 1, party A in wave 1, etc.
I tried:
df %>% group_by(wave) %>% mutate(lag = lag(value))
Which gives me:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 100
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 600
11 2 B 2 70 7
12 2 B 3 700 70
I tried:
df %>% group_by(party, wave) %>% mutate(lag = lag(value))
Which gives me:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 1
3 1 A 3 100 10
4 1 B 1 3 NA
5 1 B 2 30 3
6 1 B 3 300 30
7 2 A 1 6 NA
8 2 A 2 60 6
9 2 A 3 600 60
10 2 B 1 7 NA
11 2 B 2 70 7
12 2 B 3 700 70
I tried:
df %>% group_by(party, wave, inc) %>% mutate(lag = lag(value))
Which gives me:
wave party inc value lag
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 1 1 NA
2 1 A 2 10 NA
3 1 A 3 100 NA
4 1 B 1 3 NA
5 1 B 2 30 NA
6 1 B 3 300 NA
7 2 A 1 6 NA
8 2 A 2 60 NA
9 2 A 3 600 NA
10 2 B 1 7 NA
11 2 B 2 70 NA
12 2 B 3 700 NA
I can continue like this. I tried different versions using df %>% arrange() and the order_by() function within lag. But for some reason I cannot figure out how to get the right lagged variable.
You could achieve your desired result by grouping only by party
and inc
:
library(dplyr)
df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
party = rep(c("A", "A", "A", "B", "B", "B"), 2),
inc = rep(c(1,2,3), 4),
value = c(1, 10, 100, 3, 30, 300, 6, 60, 600, 7, 70, 700))
df %>%
group_by(party, inc) %>%
mutate(lag = lag(value)) %>%
ungroup()
#> # A tibble: 12 x 5
#> wave party inc value lag
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 A 1 1 NA
#> 2 1 A 2 10 NA
#> 3 1 A 3 100 NA
#> 4 1 B 1 3 NA
#> 5 1 B 2 30 NA
#> 6 1 B 3 300 NA
#> 7 2 A 1 6 1
#> 8 2 A 2 60 10
#> 9 2 A 3 600 100
#> 10 2 B 1 7 3
#> 11 2 B 2 70 30
#> 12 2 B 3 700 300