Search code examples
pysparktranspose

Pyspark - Transpose


In Pyspark have dataset as below

+-----------+-----------+                                                       
|weekend_day|totals     |
+-----------+-----------+
| 2023-02-25|  401943676|
| 2023-03-11|  410220150|
+-----------+-----------+

and the expected output is

 -----------------------------------
|        | 2023-02-25 | 2023-03-11 |
| totals | 401943676  | 410220150  |

pivot is not providing the result. Please advice how it can be achieved?

Please note I don't want to use Pandas

Thank you


Solution

  • Not sure what do you mean of pivot is not providing the result?

    df = spark.createDataFrame(
        [('2023-02-25', 401943676), ('2023-03-11', 410220150)],
        schema=['weekend_day', 'totals']
    )
    df.printSchema()
    df.show(3, False)
    +-----------+---------+
    |weekend_day|totals   |
    +-----------+---------+
    |2023-02-25 |401943676|
    |2023-03-11 |410220150|
    +-----------+---------+
    

    You can use groupBy and pivot to achieve the expected output: from pyspark.sql import functions as func

    df.groupBy(
        func.lit('total').alias('col_name')
    ).pivot(
        'weekend_day'
    ).agg(
        func.first('totals')
    ).show(
        10, False
    )
    +--------+----------+----------+
    |col_name|2023-02-25|2023-03-11|
    +--------+----------+----------+
    |total   |401943676 |410220150 |
    +--------+----------+----------+