Search code examples
scalaapache-sparkanti-join

Left Anti join in Spark dataframes


I have two dataframes, and I would like to retrieve only the information of one of the dataframes, which is not found in the inner join, see the picture:

Full outer join

I have tried several ways: Inner join and filtering the rows that return at least one null, all the types of joins described on the docs for Spark 1.6 but failed to do obtain the results from just one join.

Can anybody help?


Solution

  • This is called right excluding join and you can do like below

    df1.join(df2,df1("column1")===df2("column2"),"right_outer").filter("column1 is null").show