Search code examples
pythonazurepysparkpivot

Attempting to pivot only half a dataset via Python


I have an interesting situation whereby I need to pivot (if that's the actual term) some data but effectively on only half of the dataset eg

Current example dataset

Name--------Address----------DOB----Ind1----Ind2----Ind3

Joe Bloggs---2 Fleet Street-----270784--7--------25.6-----14

Pete Gribby--32 Egg Road------050577-13-------21.6-----11

Required output

Name--------Address---------DOB-----Value-----ColName
Joe Bloggs---2 Fleet Street----270784---7---------Ind1
Joe Bloggs---2 Fleet Street----270784---25.6------Ind2
Joe Bloggs---2 Fleet Street----270784---14--------Ind3
Pete Gribby--32 Egg Road-----050577---13--------Ind1
Pete Gribby--32 Egg Road-----050577---21.6------Ind2
Pete Gribby--32 Egg Road-----050577---11--------Ind3

I'm not usually a Python person, in fact have very little experience with it, but due to the technology I'm currently using I can't use .Net Spark.

I attempted to mock something up in C# to get a feel of the process required, mainly taking the "Ind" columns, pivoting them then creating a new dataset with the repeating data, but that just got ugly.

I've also had a look at the explode function though similar questions on here, but can't quite see how to get things moving in this example

So my question is, does anyone have any points/advice/code snippets in Python to help with this conundrum?

Regards


Solution

  • You use below unpivot expression for your required results.

    df.select("Name", "Address", "DOB",expr("stack(3,'Ind1',cast(Ind1 as  double),'Ind2', cast(Ind2 as  double),'Ind3',cast(Ind3 as  double)) as (ColName,Value)"))
    

    enter image description here