I want to select rows in which (a or b) == (c or d) without having to write out all the combinations. For example:
a b c d
1 2 3 4
1 1 2 2
1 2 1 3
2 5 3 2
4 5 5 4
df$equal <- df$a == df$c | df$a == df$d | df$b == df$c | df$b == df$d
would result in:
a b c d equal
1 2 3 4 FALSE
1 1 2 2 FALSE
1 2 1 3 TRUE
2 5 3 2 TRUE
4 5 5 4 TRUE
Is there a way to condense the statement, (a or b) == (c or d) so that one might not have to write out all four combinations? I need this for more complications situations in which there are more combinations. e.g., (a or b) == (c or d) == (e or f) == (g or h)
We could select the columns of interest and do the ==
df$equal <- Reduce(`|`, lapply(df[1:2], \(x) rowSums(df[3:4] == x) > 0))
-output
> df
a b c d equal
1 1 2 3 4 FALSE
2 1 1 2 2 FALSE
3 1 2 1 3 TRUE
4 2 5 3 2 TRUE
5 4 5 5 4 TRUE
Or using if_any
library(dplyr)
df %>%
mutate(equal = if_any(a:b, ~.x == c|.x == d))
a b c d equal
1 1 2 3 4 FALSE
2 1 1 2 2 FALSE
3 1 2 1 3 TRUE
4 2 5 3 2 TRUE
5 4 5 5 4 TRUE
If there are more columns and the comparison is based on 'a', 'b' columns
df %>%
mutate(equal = if_any(-c(a, b), ~ .x == a|.x == b))
df <- structure(list(a = c(1L, 1L, 1L, 2L, 4L), b = c(2L, 1L, 2L, 5L,
5L), c = c(3L, 2L, 1L, 3L, 5L), d = c(4L, 2L, 3L, 2L, 4L)),
class = "data.frame", row.names = c(NA,
-5L))