Search code examples
rfilteracross

filter across columns using row predicate to keep only negative cell values


I'm interested in reducing a very large correlation matrix to keep only the cells and rows that have negative associations. I have already reduced it somewhat using df%>%filter_all(any_vars(.<0)) This is an example of the subset I get afterwards. How can I select the columns on the basis of their cell content without having to select by name: those that have any negative value (D and E)? I am partial to a tidyverse answer if possible but I'll take what I can get. I thought some sort of across() + if_else() since I don't mind turning all non-negative numbers into NA but I couldn't figure it out.

ex <- tribble(~A, ~B, ~C, ~D, ~E,
       "L", 0.133, 0.446, -0.0190, NA,
        "M", 0.166, 0.136,  0.0893, 0.0755,
        "N", 0.110, 0.159,  0.0872, -0.186,
        "O", 0.0161, NA, 0.0272, -0.0767,
      "P",  0.147, 0.0864, 0.0417, -0.0629)

Solution

  • We can use select with where. Create two conditions in a short-circuit (&&) to match the column type as numeric and there are any non-NA values less than 0

    library(dplyr)
    ex %>%
          dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))
    

    -output

    # A tibble: 5 x 2
            D       E
        <dbl>   <dbl>
    1 -0.019  NA     
    2  0.0893  0.0755
    3  0.0872 -0.186 
    4  0.0272 -0.0767
    5  0.0417 -0.0629
    

    If we want to keep any rows that are less than 0 from the columns selected

    library(purrr)
    ex %>%
           dplyr::select(where(~ is.numeric(.) && any(.[complete.cases(.)] < 0))) %>%
           filter(if_any(everything(), ~ . < 0))
    # A tibble: 4 x 2
            D       E
        <dbl>   <dbl>
    1 -0.019  NA     
    2  0.0872 -0.186 
    3  0.0272 -0.0767
    4  0.0417 -0.0629
    

    If we want to keep the other column types as well

    ex %>%
         dplyr::select(c(where(negate(is.numeric)),
             where(~ is.numeric(.) && any(.[complete.cases(.)] < 0)))) %>% 
         filter(if_any(where(is.numeric), ~ . < 0))
    # A tibble: 4 x 3
      A           D       E
      <chr>   <dbl>   <dbl>
    1 L     -0.019  NA     
    2 N      0.0872 -0.186 
    3 O      0.0272 -0.0767
    4 P      0.0417 -0.0629