Search code examples
rdplyrtidyverserowwise

How to select row with exactly only 2 unique value with tidyverse?


What I have:

library(magrittr)
set.seed(1234)
what_i_have <- tibble::tibble(
    A = c(0, 1) |> sample(5, replace = TRUE),
    B = c(0, 1) |> sample(5, replace = TRUE),
    C = c(0, 1) |> sample(5, replace = TRUE)
)

It looks like this:

> what_i_have
# A tibble: 5 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     1     1     1
2     1     0     1
3     1     0     1
4     1     0     0
5     0     1     1

What I want:

what_i_want <- what_i_have %>% .[apply(., 1, function(row) row |> unique() |> length() == 2),]

It looks like this:

# A tibble: 4 x 3
      A     B     C
  <dbl> <dbl> <dbl>
1     1     0     1
2     1     0     1
3     1     0     0
4     0     1     1

My question is: is there a tidyverse way to do the things above?

I tried this:

what_i_have |> 
    dplyr::rowwise() |> 
    dplyr::filter_all(function(row) row |> unique() |> length() == 2)

but it returns the following empty tibble and I do not know why

# A tibble: 0 x 3
# Rowwise: 
# … with 3 variables: A <dbl>, B <dbl>, C <dbl>

Thank you.


Solution

  • Here is one option with tidyverse. Here, I treat each row as a vector (via c_across), then get the number of distinct values using n_distinct and return TRUE for the rows that have 2 unique values.

    library(tidyverse)
    
    what_i_have %>%
      rowwise %>%
      filter(n_distinct(c_across(everything())) == 2)
    

    Output

          A     B     C
      <dbl> <dbl> <dbl>
    1     0     1     1
    2     1     0     1
    3     1     0     0
    4     1     1     0
    

    A mixed method approach with apply could be:

    what_i_have %>% 
      filter(apply(., 1, \(x)length(unique(x)))==2)
    

    Data

    what_i_have  <-
      structure(
        list(
          A = c(0, 1, 1, 1, 1),
          B = c(1, 0, 0, 1, 1),
          C = c(1, 1, 0, 1, 0)
        ),
        class = c("tbl_df", "tbl", "data.frame"),
        row.names = c(NA,-5L)
      )