Search code examples
rjoinouter-join

Is there a function in R to perform an exclusive full join?


I am merging two dataframes that have some overlapping observations. These observations don't overlap on all columns so they are not identical, but they are the same on the columns I've decided are important for linking. How do I merge/join such that the matched observations are excluded?

I'm familiar with the different join functions and how to perform inner and outer joins using merge(), but I don't see an option for excluding the rows that would constitute an inner join.

This is a similar question on the topic, Exclusive Full Join in r but it assumes there are different columns in each dataframe that will produce NAs upon full join. How would you do it if the dataframes shared all the same columns?

The workaround I am using is to use duplicated() from first and last to remove the rows after full joining. Is there a more elegant way to get the complement of inner join?

df_joined <- merge(df1, df2, all = TRUE)
df_joined <- subset(df_joined, !(duplicated(df_joined[ 
,linking_cols])==TRUE | duplicated(df_joined[ ,linking_cols], fromLast = 
TRUE)==TRUE))

Solution

  • you need to combine two anti joins

    library(dplyr)
    bind_rows(
      anti_join(df1, df2),
      anti_join(df2, df1),
    )