I am coding Python in Databricks and I am using spark 2.4.5.
I need to have a UDF with two parameters. The first one is a Dataframe and the second one is SKid, in that Dataframe then I need to hash all columns on that dataframe.
I have written the below code but I need to know how can I concat all columns in a dynamic dataframe?
def xHashDataframe(df,skColumn):
a = df.select(
col(skColumn)
,md5(
concat(
col("column1"), lit("~"),
col("column2"), lit("~"),
...
col("columnN"), lit("~")
)).alias("RowHash")
)
return a
There is no need to use a UDF. concat_ws should do the trick:
df.withColumn("RowHash", F.md5(F.concat_ws("~", *df.columns))).show(truncate=False)