If I have this data:
df1 <- data.frame(name = c("apple", "apple", "apple", "orange", "orange"),
ID = c(1, 2, 3, 4, 5),
is_fruit = c("yes", "yes", "yes", "yes", "yes"))
and I want to keep only the unique rows, but ignore the ID
column such that the output looks like this:
df2 <- data.frame(name = c("apple", "orange"),
ID = c(1, 4),
is_fruit = c("yes", "yes"))
df2
# name ID is_fruit
#1 apple 1 yes
#2 orange 4 yes
How can I do this, ideally with dplyr
?
You can use distinct
function; By specifying the variables explicitly, you can retain unique rows just based on these columns; And also from ?distinct
:
If there are multiple rows for a given combination of inputs, only the first row will be preserved
distinct(df1, name, is_fruit, .keep_all = T)
# name ID is_fruit
#1 apple 1 yes
#2 orange 4 yes