Search code examples
rdataframedplyrsubset

Finding the grouping variable for which the unique values of a variable is more than one


In DATA below, I was wondering how to find the unique study_id for which variable scale takes on more than one unique value?

The expected answer should be Li (scale for Li has other & MBTI). But I wonder how to find it via BASE or dplyr code?

m="
study_id   year es_id       r     se     n pub_type  context  ed_setting  age_grp L1    L2    prof  scale outcome
Dreyer     1992   130  0      0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Listen~
Dreyer     1992   131  0.04   0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Writing
Dreyer     1992   132 -0.03   0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Reading
Dreyer     1992   133  0      0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Overall
Ghapanchi  2011    89  0.31   0.0806   141 JournalA~ Foreign~ CollegeUni~ Adult   Pers~ Engl~ NA    Other Overall
Hassan     2001   177  0.25   0.117     71 NA        Foreign~ CollegeUni~ NA      Arab~ Engl~ NA    Other Speaki~
Kralova    2012   137  0.0252 0.117     75 JournalA~ Foreign~ CollegeUni~ Adult   Slov~ Engl~ Inte~ Other Speaki~
Li         2009    55 -0.04   0.132     59 JournalA~ Foreign~ CollegeUni~ Adult   Chin~ Engl~ NA    Other Grammar
Li         2009    56  0.355  0.124     59 JournalA~ Foreign~ CollegeUni~ Adult   Chin~ Engl~ NA    Other Pragma~
Li         2003    57  0.039  0.0735   187 JournalA~ Foreign~ CollegeUni~ Multip~ Chin~ Engl~ NA    MBTI  Overall
"

DATA <- read.table(text = m, h=T)

Solution

  • Here's a way in dplyr as well as base R -

    The idea is to select rows with unique study_id where there is more than one unique scale values.

    library(dplyr)
    
    DATA %>%
      group_by(study_id) %>%
      dplyr::filter(n_distinct(scale) > 1) %>%
      ungroup %>%
      distinct(study_id)
    
    # study_id
    #  <chr>   
    #1 Li      
    

    Base R -

    unique(subset(DATA, ave(scale, study_id, 
           FUN = function(x) length(unique(x))) > 1, select = study_id))
    
    #  study_id
    #8       Li