Search code examples
pythonapache-sparkpysparkapache-spark-sqlazure-databricks

How to de-serialize the spark data frame into another data frame


I am trying to de-serialize the the spark data frame into another data frame as expected below.

Existing Dataframe Data:

enter image description here

Existing Dataframe schema:

enter image description here

Expected Dataframe:

enter image description here

Can anyone help me on this?


Solution

  • You can use the explode function for that.

    from pyspark.sql.functions import explode 
    df.withColumn("ns2:fileName", explode(df.ns2:fileName))
    

    EDIT

    df.withColumn("result", explode(zip($"ns2:fileName", $"ns2:alias"))).select(
       $"result._1".alias("ns2:fileName"), $"result._2".alias("ns2:alias"))
    

    Possible duplicate.