Search code examples
rdplyrtidyverse

Using any() vs | in dplyr::mutate


Why should I use | vs any() when I'm comparing columns in dplyr::mutate()?

And why do they return different answers?

For example:

library(tidyverse)
df  <- data_frame(x = rep(c(T,F,T), 4), y = rep(c(T,F,T, F), 3), allF  = F, allT = T)

 df %>%
     mutate( 
          withpipe = x | y # returns expected results by row
        , usingany = any(c(x,y)) # returns TRUE for every row
     )

What's going on here and why should I use one way of comparing values over another?


Solution

  • The difference between the two is how the answer is calculated:

    • for |, elements are compared row-wise and boolean logic is used to return the proper value. In the example above each x and y pair are compared to each other and a logical value is returned for each pair, resulting in 12 different answers, one for each row of the data frame.
    • any(), on the other hand, looks at the entire vector and returns a single value. In the above example, the mutate line that calculates the new usingany column is basically doing this: any(c(df$x, df$y)), which will return TRUE because there's at least one TRUE value in either df$x or df$y. That single value is then assigned to every row of the data frame.

    You can see this in action using the other columns in your data frame:

    df %>% 
        mutate(
            usingany = any(c(x,y)) # returns all TRUE
          , allfany  = any(allF)   # returns all FALSE because every value in df$allF is FALSE
        )
    

    To answer when you should use which: use | when you want to compare elements row-wise. Use any() when you want a universal answer about the entire data frame.

    TLDR, when using dplyr::mutate(), you're usually going to want to use |.