I have a dataframe in R like this:
id year othercolumns
1 2017 ...
2 2017 ...
1 2018 ...
2 2018 ...
3 2018 ...
4 2018 ...
1 2019 ...
2 2019 ...
3 2019 ...
4 2019 ...
5 2019 ...
I need to select unique values for id, but only the record of the first year in which it appears remains. The result I need is this.
id year othercolumns
1 2017 ...
2 2017 ...
3 2018 ...
4 2018 ...
5 2019 ...
My data can have any start year, but the end will always be 2020.
Using dplyr
,
df <- data.frame(
id= c(1,2,1,2,3,4,1,2,3,4,5),
year = c(2017,2017,2018,2018,2018,2018,2019,2019,2019,2019,2019)
)
require(dplyr)
df %>%
group_by(id) %>%
summarise(year = first(year))
#> # A tibble: 5 × 2
#> id year
#> <dbl> <dbl>
#> 1 1 2017
#> 2 2 2017
#> 3 3 2018
#> 4 4 2018
#> 5 5 2019
Created on 2022-05-10 by the reprex package (v2.0.1)