Search code examples
scalaapache-spark

Transform scala to Spark


I have to transform the code below into spark, But I dont understand what exactly Sql perform in this code ?

val tempFactDF = unionTempDF.join(fact.select("x","y","d","f","s"),
                                  Seq("x","y","d","f")).dropDuplicates

Solution

  • Here it is performing a join operation over multiple columns and it is defined as Seq("x","y","d","f").

    It is equivalent to:

    val joiningTable = fact.select("x","y","d","f","s")
    unionTempDF.join(joiningTable, unionTempDF("x") === joiningTable("x") &&
    unionTempDF("y") === joiningTable("y") &&
    unionTempDF("d") === joiningTable("d") &&
    unionTempDF("f") === joiningTable("f"))