Search code examples
pythondataframepysparkpivottranspose

Pyspark Pivot table with Multiple columns


I am trying to pivot a dataframe with one key and multiple values across different columns . How do I do this in pyspark ? I have used pivot with one key value pair before and trying to figure this out.

Sample dataframe

id test_id test_status key score1 score2 score3
ABC 1 complete q1 1 2 3
ABC 1 complete q2 4 5 6
ABC 2 complete q1 1 6 7
ABC 2 complete q2 5 6 7

expected dataframe

id test_id test_status q1_score1 q1_score2 q1_score3 q2_score1 q2_score2 q2_score3
ABC 1 complete 1 2 3 4 5 6
ABC 2 complete 1 6 7 5 6 7

Solution

  • You can do multiple columns pivot.

    df = (df.groupby('id', 'test_id', 'test_status')
          .pivot('key')
          .agg(*[F.first(x).alias(x) for x in ['score1', 'score2', 'score3']]))