Search code examples
pysparktransposeexplode

How to transpose data in pyspark for multiple different columns


I am trying to transpose data in pyspark. I was able to transpose using a single column. However, with multiple columns I am not sure how to pass parameters to explode function.

Input format:

enter image description here

Output Format :

enter image description here

Can someone please hint me with any example or reference? Thanks in advance.


Solution

  • A cleaned PySpark version of this

    from pyspark.sql import functions as F
    df_a = spark.createDataFrame([(1,'xyz','MS','abc','Phd','pqr','BS'),(2,"POR","MS","ABC","Phd","","")],[ 
    
    "id","Education1CollegeName","Education1Degree","Education2CollegeName","Education2Degree","Education3CollegeName","Education3Degree"])
    
    +---+---------------------+----------------+---------------------+----------------+---------------------+----------------+
    | id|Education1CollegeName|Education1Degree|Education2CollegeName|Education2Degree|Education3CollegeName|Education3Degree|
    
        +---+---------------------+----------------+---------------------+----------------+---------------------+----------------+
        |  1|                  xyz|              MS|                  abc|             Phd|                  pqr|              BS|
        |  2|                  POR|              MS|                  ABC|             Phd|                     |                |
        +---+---------------------+----------------+---------------------+----------------+---------------------+----------------+
    

    Code -

    df = df_a.selectExpr("id", "stack(3, Education1CollegeName, Education1Degree,Education2CollegeName, Education2Degree,Education3CollegeName, Education3Degree) as (B, C)")
    
    +---+---+---+
    | id|  B|  C|
    +---+---+---+
    |  1|xyz| MS|
    |  1|abc|Phd|
    |  1|pqr| BS|
    |  2|POR| MS|
    |  2|ABC|Phd|
    |  2|   |   |
    +---+---+---+