Search code examples
rdplyrgroup-bylag

dplyr lag function multiple nested data


I want to create a lag variable for a value that is nested in three groups:

For example:

df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
                 party = rep(c("A", "A", "A", "B", "B", "B"), 2),
                 inc = rep(c(1,2,3), 4), 
                 value = c(1, 10, 100, 3, 30, 300, 6, 60, 600, 7, 70, 700)) 

Data:

   wave party inc value
1     1     A   1     1
2     1     A   2    10
3     1     A   3   100
4     1     B   1     3
5     1     B   2    30
6     1     B   3   300
7     2     A   1     6
8     2     A   2    60
9     2     A   3   600
10    2     B   1     7
11    2     B   2    70
12    2     B   3   700

What I need is the following:

   wave party inc value lag
1     1     A   1     1  NA
2     1     A   2    10  NA
3     1     A   3   100  NA
4     1     B   1     3  NA
5     1     B   2    30  NA
6     1     B   3   300  NA
7     2     A   1     6   1
8     2     A   2    60  10
9     2     A   3   600 100
10    2     B   1     7   3
11    2     B   2    70  30
12    2     B   3   700 300

Where a respondent of income group (inc) 1, of party A in wave 2 has the lagged value of inc 1, party A in wave 1, etc.

I tried:

df %>% group_by(wave) %>% mutate(lag = lag(value))

Which gives me:

    wave party   inc value   lag
   <dbl> <chr> <dbl> <dbl> <dbl>
 1     1 A         1     1    NA
 2     1 A         2    10     1
 3     1 A         3   100    10
 4     1 B         1     3   100
 5     1 B         2    30     3
 6     1 B         3   300    30
 7     2 A         1     6    NA
 8     2 A         2    60     6
 9     2 A         3   600    60
10     2 B         1     7   600
11     2 B         2    70     7
12     2 B         3   700    70

I tried:

df %>% group_by(party, wave) %>% mutate(lag = lag(value))

Which gives me:

    wave party   inc value   lag
   <dbl> <chr> <dbl> <dbl> <dbl>
 1     1 A         1     1    NA
 2     1 A         2    10     1
 3     1 A         3   100    10
 4     1 B         1     3    NA
 5     1 B         2    30     3
 6     1 B         3   300    30
 7     2 A         1     6    NA
 8     2 A         2    60     6
 9     2 A         3   600    60
10     2 B         1     7    NA
11     2 B         2    70     7
12     2 B         3   700    70

I tried:

df %>% group_by(party, wave, inc) %>% mutate(lag = lag(value))

Which gives me:

    wave party   inc value   lag
   <dbl> <chr> <dbl> <dbl> <dbl>
 1     1 A         1     1    NA
 2     1 A         2    10    NA
 3     1 A         3   100    NA
 4     1 B         1     3    NA
 5     1 B         2    30    NA
 6     1 B         3   300    NA
 7     2 A         1     6    NA
 8     2 A         2    60    NA
 9     2 A         3   600    NA
10     2 B         1     7    NA
11     2 B         2    70    NA
12     2 B         3   700    NA

I can continue like this. I tried different versions using df %>% arrange() and the order_by() function within lag. But for some reason I cannot figure out how to get the right lagged variable.


Solution

  • You could achieve your desired result by grouping only by party and inc:

    library(dplyr)
    
    df <- data.frame(wave = c(1,1,1,1,1,1,2,2,2,2,2,2),
                     party = rep(c("A", "A", "A", "B", "B", "B"), 2),
                     inc = rep(c(1,2,3), 4), 
                     value = c(1, 10, 100, 3, 30, 300, 6, 60, 600, 7, 70, 700))
    
    df %>% 
      group_by(party, inc) %>% 
      mutate(lag = lag(value)) %>% 
      ungroup()
    #> # A tibble: 12 x 5
    #>     wave party   inc value   lag
    #>    <dbl> <chr> <dbl> <dbl> <dbl>
    #>  1     1 A         1     1    NA
    #>  2     1 A         2    10    NA
    #>  3     1 A         3   100    NA
    #>  4     1 B         1     3    NA
    #>  5     1 B         2    30    NA
    #>  6     1 B         3   300    NA
    #>  7     2 A         1     6     1
    #>  8     2 A         2    60    10
    #>  9     2 A         3   600   100
    #> 10     2 B         1     7     3
    #> 11     2 B         2    70    30
    #> 12     2 B         3   700   300