I have a data frame that have columns of ID, Date, Code and Names. I have multiple entries of ID at different dates with similar or different values in Names column. Below is the example.
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-15 1.1.1 Alpha
1 2010-12-15 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
2 2010-12-17 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
3 2011-04-25 1.1.1 Gamma
4 2011-04-25 1.1.1 Tango
I want to keep the rows by ID that occurred first by date and Names. Delete the rest with different dates and similar names. Below is the example of my resultant dataframe.
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-09 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
6 2011-04-25 1.1.1 Tango
You can use slice_min
:
library(dplyr)
slice_min(your_df, Date, by = c(ID, Names))
# ID Date Code Names
# 1 1 2010-12-09 1.1.1 Alpha
# 2 1 2010-12-15 1.1.1 Beta
# 3 2 2010-12-09 1.1.1 Beta
# 4 3 2011-02-09 1.1.1 Gamma
# 5 4 2011-04-25 1.1.1 Tango