Search code examples
rdataframefunctiondplyrfilter

Finding rows in a data.frame that are the same on one variable but different on another variable in R


In my DATA below, how could I filter the rows where the Nm values are the same but Descr values are different to achieve my Desired_out below?

DATA <- read.table(header=T, text ="
Cd    Nm     Descr
710   spa    Castilian
4260  spa    Spanish
2290  gr     Greek
1213  gr    Greek_b
1765  ger    German
2340  ita    Italian
")

Desired_output <- read.table(header=T, text ="
Cd    Nm     Descr
710   spa    Castilian
4260  spa    Spanish
2290  gr     Greek
1213  gr    Greek_b")


Solution

  • With dplyr, detecting duplicated entries with vec_duplicate_detect from vctrs

    library(dplyr)
    
    DATA %>% 
      filter(all(!vctrs::vec_duplicate_detect(Descr)) & n() > 1, .by = Nm)
        Cd  Nm     Descr
    1  710 spa Castilian
    2 4260 spa   Spanish
    3 2290  gr     Greek
    4 1213  gr   Greek_b