How can I extract (or isolate_ group-wise constant columns from a data frame, using dplyr/tidyverse?
This is an update of Dowle/Hadley's decades-old question here. The earlier poster's example...
Using a contrived example from iris (to generate a dataset with columns that are constant by group for this example )
irisX <- iris %>% mutate(
numspec = as.numeric(Species),
numspec2 = numspec*2
)
Now I want to generate a dataset that keeps the columns Species
, numspec
, and numspec2
only (and keeps only one row for each).
And I don't want to have to tell it which columns these are (constant by group) -- I want it to find these for me.
So what I want is
Species, numspec, numspec2
setosa, 1, 2
versicolor, 2, 4
virginica, 3, 6
Unlike in the older linked question I want to do something using the tidyverse so I can understand it better and the code looks cleaner.
I tried something like
single_iris <- irisX %>%
group_by(Species) %>%
select_if(function(.) n_distinct(.) == 1)
But the latter select_if
ignores the groupings.
If we want to use select
, do it outside the grouping
library(dplyr)
irisX %>%
select(where(~ n_distinct(.) == n_distinct(irisX$Species))) %>%
distinct()