I have a dataset which looks something like this:
ID | Year | Set | Value |
---|---|---|---|
1 | 2020 | A | 20 |
1 | 2020 | B | 30 |
1 | 2021 | C | 28 |
1 | 2021 | D | 21 |
1 | 2022 | E | 28 |
1 | 2022 | F | 21 |
2 | 2020 | A | 20 |
2 | 2020 | B | 30 |
2 | 2021 | C | 28 |
2 | 2021 | D | 21 |
3 | 2020 | A | 20 |
3 | 2020 | B | 30 |
So it's a panel dataset, which has people (indexed by ID), observed over multiple years, with different values of different things. I want to create a variable which takes on a value of TRUE if the person is present in the following year (like 1 for both years & 2 for 2020), and FALSE if it isn't (like 3 in 2020 and 2 in 2021). And there are a lot more than three years for some of them, but only one year for some. I feel like this should be easy using either dplyr or data.table, but I can't wrap my head around it. Any help would be most welcome!
Here is one potential solution:
library(tidyverse)
df <- tibble::tribble(
~ID, ~Year, ~Set, ~Value,
1L, 2020L, "A", 20L,
1L, 2020L, "B", 30L,
1L, 2021L, "C", 28L,
1L, 2021L, "D", 21L,
1L, 2022L, "E", 28L,
1L, 2022L, "F", 21L,
2L, 2020L, "A", 20L,
2L, 2020L, "B", 30L,
2L, 2021L, "C", 28L,
2L, 2021L, "D", 21L,
3L, 2020L, "A", 20L,
3L, 2020L, "B", 30L
)
df %>%
group_by(ID) %>%
mutate(present_in_following_year = Year != max(Year))
#> # A tibble: 12 × 5
#> # Groups: ID [3]
#> ID Year Set Value present_in_following_year
#> <int> <int> <chr> <int> <lgl>
#> 1 1 2020 A 20 TRUE
#> 2 1 2020 B 30 TRUE
#> 3 1 2021 C 28 TRUE
#> 4 1 2021 D 21 TRUE
#> 5 1 2022 E 28 FALSE
#> 6 1 2022 F 21 FALSE
#> 7 2 2020 A 20 TRUE
#> 8 2 2020 B 30 TRUE
#> 9 2 2021 C 28 FALSE
#> 10 2 2021 D 21 FALSE
#> 11 3 2020 A 20 FALSE
#> 12 3 2020 B 30 FALSE
Created on 2022-12-06 with reprex v2.0.2
Does that solve your problem, or have I misunderstood the question?