I have a data frame with two columns, let's call them "col1" and "col2". There are some rows where the values in "col1" are duplicated, but the values in "col2" are different. I want to remove the duplicates in "col1" where they have different values in "col2".
Here's a sample data frame:
df <- data.frame(col1 = c("A", "A","A","B","B","C","D","E","F","F"), col2 = c("g1", "g1", "g1", "g2", "g3", "g1","g4","g4","g5","g6"))
In this example, row 4 and 5 have same value in col1 but different value in col2. Similarly, row 9 and 10 same same value in col1 and different value in col2. I want to remove these rows.
The desire output would be
>df
col1 col2
A g1
A g1
A g1
C g1
D g4
E g4
I tried
df_1<-df %>% arrange(col1) %>% distinct(col1,col2,.keep_all=TRUE)
But again, this only select distinct values which is opposite to what i want. Also this dataframe contains more than 10 other columns. But I wanna focus on these two columns and get the output as complete dataframe with other columns too.
How can I do this in R?
Thanks in advance for your help!
Just a simple solution
df_filtered <- df %>%
group_by(col1) %>%
filter(n_distinct(col2) == 1) %>%
ungroup()