i have followed this question but the answers there not working for me i don't want a UDF for this and map_concat doesn't work for me. is there any other way to combine maps?
eg
id | value |
---|---|
1 | Map(k1 -> v1) |
2 | Map(k2 -> v2) |
output should be
id | value |
---|---|
1 | Map(k1 -> v1, k2 -> v2) |
Here is my solution, I'm assuming that we can drop id
from pyspark.sql import functions as f
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('test').getOrCreate()
data = [{'id':1, 'map':{'k1': 'v1'}}, {'id':2, 'map':{'k2': 'v2'}}, {'id':3, 'map':{'k3': 'v3'}}]
df = spark.createDataFrame(data)
# removing id , adding grouping column
d_df = df.drop('id').withColumn('group_id', f.lit(1))
# aggregating into array of maps
g_df = d_df.groupBy('group_id')\
.agg(f.collect_list('map').alias('maps'))
# concating the maps
final_df = g_df.select(f.aggregate('maps', f.create_map().cast("map<string,string>"), lambda acc, i: f.map_concat(acc, i)).alias('map_of_maps'))
final_df.show()
Result:
+--------------------+
| map_of_maps|
+--------------------+
|{k1 -> v1, k2 -> ...|
+--------------------+