I have a requirement to cbind [as it happens in R] two dataframes in spark using scala, which do not have a ID column. Any pointers on any readily available function for it, or some other workaround for it?
Example:
DF1:
Name Age
ABC 10
BCD 11
DF2:
Marks
75
85
Result needed:
DF3:
Name Age Marks
ABC 10 75
BCD 11 85
This works perfectly as a workaround:
df1 = df1.withColumn("id", monotonically_increasing_id())
df2 = df2.withColumn("id", monotonically_increasing_id())
df3 = df2.join(df1, "id", "outer").drop("id")
For spark 1.6.*, the last line needs to be:
df3 = df2.join(df1, df1("id") === df2("id"), "outer").drop("id")