Search code examples
rdplyrduplicates

Extract (or isolate) 'group-wise constant' columns from a data frame, *using dplyr/tidyverse*


How can I extract (or isolate_ group-wise constant columns from a data frame, using dplyr/tidyverse?

This is an update of Dowle/Hadley's decades-old question here. The earlier poster's example...

Using a contrived example from iris (to generate a dataset with columns that are constant by group for this example )

irisX <- iris %>% mutate(
    numspec = as.numeric(Species),
    numspec2 = numspec*2
)

Now I want to generate a dataset that keeps the columns Species, numspec, and numspec2 only (and keeps only one row for each).

And I don't want to have to tell it which columns these are (constant by group) -- I want it to find these for me.

So what I want is

Species, numspec, numspec2
setosa, 1, 2
versicolor, 2, 4
virginica, 3, 6

Unlike in the older linked question I want to do something using the tidyverse so I can understand it better and the code looks cleaner.

I tried something like

single_iris <- irisX %>% 
group_by(Species) %>% 
select_if(function(.) n_distinct(.) == 1)

But the latter select_if ignores the groupings.


Solution

  • If we want to use select, do it outside the grouping

    library(dplyr)
    irisX %>%
         select(where(~ n_distinct(.) == n_distinct(irisX$Species))) %>%     
         distinct()