Search code examples
dplyr

Is semi_join associative in nature?


From the definition of semi_join I thought it was associative in nature? Am not sure whether am the one who is not keen to understand...

library(dplyr)

set.seed(100)
x1 <- tibble(
  id = floor(runif(100, 10, 1000))
)


set.seed(20)

x2 <- tibble(
  id = floor(runif(100, 10, 1000))
)


x1 |> semi_join(x2) |> arrange(id)
x2 |> semi_join(x1) |> arrange(id)


Solution

  • Reversing a semi_join (intended to return all rows from x with a match in y) may not give the same result as illustrated here:

    library(dplyr)
    
    df1 <- tibble(x = c(1, 2, 3, 2))
    df2 <- tibble(x = c(2, 3, 4, 4))
    
    semi_join(df1, df2) # 3 rows from df1 match df2
    #> Joining with `by = join_by(x)`
    #> # A tibble: 3 × 1
    #>       x
    #>   <dbl>
    #> 1     2
    #> 2     3
    #> 3     2
    
    semi_join(df2, df1) # 2 rows from df2 match df1
    #> Joining with `by = join_by(x)`
    #> # A tibble: 2 × 1
    #>       x
    #>   <dbl>
    #> 1     2
    #> 2     3
    

    Created on 2024-04-24 with reprex v2.1.0