Search code examples
rtidyverse

Unique rows with two columns in R


I have a dataframe in R like this:

id  year othercolumns
1   2017 ...
2   2017 ...
1   2018 ...
2   2018 ...
3   2018 ...
4   2018 ...
1   2019 ...
2   2019 ...
3   2019 ...
4   2019 ...
5   2019 ...

I need to select unique values for id, but only the record of the first year in which it appears remains. The result I need is this.

id year othercolumns
1  2017 ...
2  2017 ...
3  2018 ...
4  2018 ...
5  2019 ...

My data can have any start year, but the end will always be 2020.


Solution

  • Using dplyr,

    df <- data.frame(
      id= c(1,2,1,2,3,4,1,2,3,4,5),
      year = c(2017,2017,2018,2018,2018,2018,2019,2019,2019,2019,2019)
    )
    require(dplyr)
    
    df %>% 
      group_by(id) %>% 
      summarise(year = first(year))
    #> # A tibble: 5 × 2
    #>      id  year
    #>   <dbl> <dbl>
    #> 1     1  2017
    #> 2     2  2017
    #> 3     3  2018
    #> 4     4  2018
    #> 5     5  2019
    

    Created on 2022-05-10 by the reprex package (v2.0.1)