Search code examples
rdplyrdata.tablepanel

Using dplyr in R to check whether observations are present in the next time period in panel data


I have a dataset which looks something like this:

ID Year Set Value
1 2020 A 20
1 2020 B 30
1 2021 C 28
1 2021 D 21
1 2022 E 28
1 2022 F 21
2 2020 A 20
2 2020 B 30
2 2021 C 28
2 2021 D 21
3 2020 A 20
3 2020 B 30

So it's a panel dataset, which has people (indexed by ID), observed over multiple years, with different values of different things. I want to create a variable which takes on a value of TRUE if the person is present in the following year (like 1 for both years & 2 for 2020), and FALSE if it isn't (like 3 in 2020 and 2 in 2021). And there are a lot more than three years for some of them, but only one year for some. I feel like this should be easy using either dplyr or data.table, but I can't wrap my head around it. Any help would be most welcome!


Solution

  • Here is one potential solution:

    library(tidyverse)
    
    df <- tibble::tribble(
            ~ID, ~Year, ~Set, ~Value,
             1L, 2020L,  "A",    20L,
             1L, 2020L,  "B",    30L,
             1L, 2021L,  "C",    28L,
             1L, 2021L,  "D",    21L,
             1L, 2022L,  "E",    28L,
             1L, 2022L,  "F",    21L,
             2L, 2020L,  "A",    20L,
             2L, 2020L,  "B",    30L,
             2L, 2021L,  "C",    28L,
             2L, 2021L,  "D",    21L,
             3L, 2020L,  "A",    20L,
             3L, 2020L,  "B",    30L
            )
    
    df %>%
      group_by(ID) %>%
      mutate(present_in_following_year = Year != max(Year))
    #> # A tibble: 12 × 5
    #> # Groups:   ID [3]
    #>       ID  Year Set   Value present_in_following_year
    #>    <int> <int> <chr> <int> <lgl>                    
    #>  1     1  2020 A        20 TRUE                     
    #>  2     1  2020 B        30 TRUE                     
    #>  3     1  2021 C        28 TRUE                     
    #>  4     1  2021 D        21 TRUE                     
    #>  5     1  2022 E        28 FALSE                    
    #>  6     1  2022 F        21 FALSE                    
    #>  7     2  2020 A        20 TRUE                     
    #>  8     2  2020 B        30 TRUE                     
    #>  9     2  2021 C        28 FALSE                    
    #> 10     2  2021 D        21 FALSE                    
    #> 11     3  2020 A        20 FALSE                    
    #> 12     3  2020 B        30 FALSE
    

    Created on 2022-12-06 with reprex v2.0.2

    Does that solve your problem, or have I misunderstood the question?