pyspark dataframe array of struct to columns

I have a dataframe with :

"abc": array [
    "def": struct {
        "id": string,
        "value": string
    }
]

id can be "PHONE", "FAX" and "MAIL" So, this is a sample :

technical_id	column_to_explode
1	[["PHONE", "083665xxxx"], ["FAX", "0325xxxxxx"]]
2	[["MAIL", "abc@xxx.com"]]
3	null

Is it possible to transform to :

technical_id	column_to_explode	PHONE	FAX	MAIL
1	[["PHONE", "083665xxxx"], ["FAX", "0325xxxxxx"]]	083665xxxx	0325xxxxxx	null
2	[["MAIL", "abc@xxx.com"]]	null	null	abc@xxx.com
3	null	null	null	null

I'm trying with explode but it duplicate rows and I would rather to avoid this.

Thanks.

Solution

You can do a pivot after the explode to ensure unique ID columns. Here's an example.

spark.sparkContext.parallelize([([('phone', 'abc'), ('email', 'xyz')], 1), ([('fax', 'klm')], 2)]). \
    toDF(['arr_of_structs', 'id']). \
    selectExpr('*', 'inline(arr_of_structs)'). \
    groupBy('id'). \
    pivot('_1'). \
    agg(func.first('_2')). \
    show(truncate=False)

# +---+-----+----+-----+
# |id |email|fax |phone|
# +---+-----+----+-----+
# |1  |xyz  |null|abc  |
# |2  |null |klm |null |
# +---+-----+----+-----+

The input dataframe looks like the following

spark.sparkContext.parallelize([([('phone', 'abc'), ('email', 'xyz')], 1), ([('fax', 'klm')], 2)]). \
    toDF(['arr_of_structs', 'id']). \
    show(truncate=False)

# +----------------------------+---+
# |arr_of_structs              |id |
# +----------------------------+---+
# |[{phone, abc}, {email, xyz}]|1  |
# |[{fax, klm}]                |2  |
# +----------------------------+---+

# root
#  |-- arr_of_structs: array (nullable = true)
#  |    |-- element: struct (containsNull = true)
#  |    |    |-- _1: string (nullable = true)
#  |    |    |-- _2: string (nullable = true)
#  |-- id: long (nullable = true)