Search code examples
rscalaapache-sparkdataframecbind

R's cbind functionality in Spark Scala


I have a requirement to cbind [as it happens in R] two dataframes in spark using scala, which do not have a ID column. Any pointers on any readily available function for it, or some other workaround for it?

Example:

DF1:

    Name Age
    ABC  10
    BCD  11

DF2:

    Marks
    75
    85

Result needed:

    DF3:
    Name Age Marks
    ABC  10  75
    BCD  11  85

Solution

  • This works perfectly as a workaround:

        df1 = df1.withColumn("id", monotonically_increasing_id())
        df2 = df2.withColumn("id", monotonically_increasing_id())
        df3 = df2.join(df1, "id", "outer").drop("id")
    

    For spark 1.6.*, the last line needs to be:

        df3 = df2.join(df1, df1("id") === df2("id"), "outer").drop("id")