Search code examples
rdplyracross

R across find only positive or only negative values tidyverse


In dplyr Column-wise operations has this example:

df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))
# Find all rows where EVERY numeric variable is greater than zero
df %>% filter(across(where(is.numeric), ~ .x > 0))
#> # A tibble: 1 x 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 b         1     1

if we change a bit the tibble:

df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))

and we want to get negative or positive values for both columns we need to name the columns:

df %>% filter((y > 0 & z > 0) | (y < 0 & z < 0))
#> # A tibble: 2 x 3
#>  x         y     z
#>  <chr> <dbl> <dbl>
#> 1 b         1     1
#> 2 c        -1    -1

with across() how can this be done?

df %>% filter(across(where(is.numeric), ~ .x > 0 | .x < 0))
#> # A tibble: 3 x 3
#>  x         y     z
#>  <chr> <dbl> <dbl>
#> 1 a         1    -1
#> 2 b         1     1
#> 3 c        -1    -1

Solution

  • We have to check for either all TRUE or all FALSE from a set of conditionals like c(T, T), c(T, F) and c(F, F). Now -

    • if_all will filter c(T, T)
    • !if_any will filter again c(T, T) from ! i.e. negation of remaining values
    • these two will be joined by a | i.e. OR
    • Thus, we will have only c(T, T) & c(F, F)

    Thus, this will do

    df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
    
    # A tibble: 2 x 3
      x         y     z
      <chr> <dbl> <dbl>
    1 b         1     1
    2 c        -1    -1
    

    Alternative

    df %>% filter(if_all(where(is.numeric), ~ .x > 0) | across(where(is.numeric), ~ .x < 0))
    
    # A tibble: 2 x 3
      x         y     z
      <chr> <dbl> <dbl>
    1 b         1     1
    2 c        -1    -1
    

    Let's check on bigger example

    set.seed(201)
    df <- data.frame(A = LETTERS[1:10], x = rnorm(10), y = rnorm(10), z = -1*rnorm(10))
    
    > df
       A           x           y           z
    1  A  0.28606069  0.69329617  0.24400084
    2  B -0.34454603  0.22380936  0.98825314
    3  C  0.32576373  0.39845694 -1.24206048
    4  D -1.69658097  1.01347438  1.68266603
    5  E -1.28548252 -0.64785307 -1.44289063
    6  F -0.07503189  0.64845271  0.46543975
    7  G  0.26693735  0.20734270 -0.69366150
    8  H  0.05593404  0.06439014  0.08772557
    9  I -2.30403431  0.66938092  0.95508038
    10 J  0.18900414 -0.37425445 -0.17010088
    
    > df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
      A           x           y           z
    1 A  0.28606069  0.69329617  0.24400084
    2 E -1.28548252 -0.64785307 -1.44289063
    3 H  0.05593404  0.06439014  0.08772557