In Pyspark have dataset as below
+-----------+-----------+
|weekend_day|totals |
+-----------+-----------+
| 2023-02-25| 401943676|
| 2023-03-11| 410220150|
+-----------+-----------+
and the expected output is
-----------------------------------
| | 2023-02-25 | 2023-03-11 |
| totals | 401943676 | 410220150 |
pivot is not providing the result. Please advice how it can be achieved?
Please note I don't want to use Pandas
Thank you
Not sure what do you mean of pivot
is not providing the result?
df = spark.createDataFrame(
[('2023-02-25', 401943676), ('2023-03-11', 410220150)],
schema=['weekend_day', 'totals']
)
df.printSchema()
df.show(3, False)
+-----------+---------+
|weekend_day|totals |
+-----------+---------+
|2023-02-25 |401943676|
|2023-03-11 |410220150|
+-----------+---------+
You can use groupBy
and pivot
to achieve the expected output:
from pyspark.sql import functions as func
df.groupBy(
func.lit('total').alias('col_name')
).pivot(
'weekend_day'
).agg(
func.first('totals')
).show(
10, False
)
+--------+----------+----------+
|col_name|2023-02-25|2023-03-11|
+--------+----------+----------+
|total |401943676 |410220150 |
+--------+----------+----------+