Search code examples
rdataframeduplicates

Keep the records by first date occurred and Name in R


I have a data frame that have columns of ID, Date, Code and Names. I have multiple entries of ID at different dates with similar or different values in Names column. Below is the example.

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-15     1.1.1     Alpha
1     2010-12-15     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
2     2010-12-17     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
3     2011-04-25     1.1.1     Gamma
4     2011-04-25     1.1.1     Tango

I want to keep the rows by ID that occurred first by date and Names. Delete the rest with different dates and similar names. Below is the example of my resultant dataframe.

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-09     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
6     2011-04-25     1.1.1     Tango

Solution

  • You can use slice_min:

    library(dplyr)
    slice_min(your_df, Date, by = c(ID, Names))
    
    #   ID       Date  Code Names
    # 1  1 2010-12-09 1.1.1 Alpha
    # 2  1 2010-12-15 1.1.1  Beta
    # 3  2 2010-12-09 1.1.1  Beta
    # 4  3 2011-02-09 1.1.1 Gamma
    # 5  4 2011-04-25 1.1.1 Tango