Search code examples
dataframedictionarypysparkmultiple-columns

Pyspark how to convert columns to maps after grouping and pivoting


How to convert multiple columns to maps after grouping and pivoting in pyspark. Code below is not working as expected.

Input dataframe:

ID TYPE Order Address Phone
1 A Primary abc 111
1 B Primary def 222
1 A Secondary ghi 333
1 B Secondary jkl 444
2 A Primary mno 555
2 A Secondary pqr 666
2 B Primary stu 777
2 B Secondary vwy 888

Expected Output dataframe:

ID Primary_A_attributes Primary_B_attributes Secondary_A_attributes Secondary_B_attributes
1 {"Address" : "abc", "phone" : "111"} {"Address" : "def", phone" : "222"} {"Address" : "ghi", "phone" : "333"} {"Address" : "jkl", phone" : "444"}
2 {"Address":"mno", "phone":"555" {"Address" : "pqr", phone" : "666"} {"Address":"stu", "phone":"777" {"Address" : "vwy", phone" : "888"}

code used:

df.withColumn("collection",F.upper(F.concat_ws('_attributes_','order','type')))\
.groupBy('id').pivot("collection").agg(F.create_map(F.lit("Address"),F.col("Address")\
F.lit("phone"),F.col("phone"))).display()

Solution

  • desired solution :

    from pyspark.sql import functions as F
    from pyspark.sql import Window as W
    data =[("1","A","Primary","abc","111"),
    ("1","B","Primary","def","222"),
    ("1","A","Secondary","ghi","333"),
    ("1","B","Secondary","jkl","444"),
    ("2","A","Primary","mno","555"),
    ("2","A","Secondary","pqr","666"),
    ("2","B","Primary","stu","777"),
    ("2","B","Secondary","vwy","888")]
    schema=["ID","TYPE","Order","Address","Phone"]
    df_source = spark.createDataFrame(data,schema)
    expr_array=F.array(F.lit("Address"),F.lit("Phone"))
    df_fnl = df_source.withColumn("collection",F.concat_ws('_','order','type'))\
    .groupBy(["id"]).pivot("collection").agg(F.map_from_arrays(expr_array,F.array_union(F.collect_list("Address"),F.collect_list("Phone"))))
    
    df_fnl.select([F.col(col).alias(col+"_attributes") if col !='id' else col for col in df_fnl.columns ]).show(10,0)
    
    OUTPUT :
    ---+------------------------------+------------------------------+------------------------------+------------------------------+
    |id |Primary_A_attributes           |Primary_B_attributes           |Secondary_A_attributes         |Secondary_B_attributes         |
    +---+------------------------------+------------------------------+------------------------------+------------------------------+
    |1  |{Address -> abc, Phone -> 111}|{Address -> def, Phone -> 222}|{Address -> ghi, Phone -> 333}|{Address -> jkl, Phone -> 444}|
    |2  |{Address -> mno, Phone -> 555}|{Address -> stu, Phone -> 777}|{Address -> pqr, Phone -> 666}|{Address -> vwy, Phone -> 888}|
    +---+------------------------------+------------------------------+------------------------------+------------------------------+
    

    kindly UPVOTE IF YOU LIKE MY SOLUTION .