Let's say I have this (simplified) data frame:
C1 <- c('a','a','b','b','c','c')
C2 <- c(10,10,20,21,30,30)
C3 <- c(1.1,2.2,3.3,4.4,5.5,6.6)
df <- data.frame(C1,C2,C3)
C1 | C2 | C3 |
---|---|---|
a | 10 | 1.1 |
a | 10 | 2.2 |
b | 20 | 3.3 |
b | 21 | 4.4 |
c | 30 | 5.5 |
c | 30 | 6.6 |
What I'm trying to do is to delete any rows containing a C1 value which has more than one match in the C2 column. In this case I would like to delete the entire rows containing 'b' in the C1 column (because 'b' has two matches - both 20 and 21 - in column C2).
This should result with this df:
C1 | C2 | C3 |
---|---|---|
a | 10 | 1.1 |
a | 10 | 2.2 |
c | 30 | 5.5 |
c | 30 | 6.6 |
Any help would be really appreciated!
Thanks,
Yuval
dplyr
is another way to do this. Use group_by
to process each C1
group separately, then filter
each group, keeping only groups with a single value of C2
library(dplyr)
C1 <- c('a','a','b','b','c','c')
C2 <- c(10,10,20,21,30,30)
C3 <- c(1.1,2.2,3.3,4.4,5.5,6.6)
df <- data.frame(C1,C2,C3)
df <- df %>%
group_by(C1) %>%
filter(length(unique(C2)) == 1) %>%
ungroup()
print(df)
Output
# A tibble: 4 x 3
C1 C2 C3
<chr> <dbl> <dbl>
1 a 10 1.1
2 a 10 2.2
3 c 30 5.5
4 c 30 6.6