Search code examples
python-3.xpandaspysparkdelimiterunpivot

split a column based on a delimiter and then unpivot the result with preserving other columns


I need to split a column to multiple rows and then unpivot them by preseving a/multiple columns, how can I achive this in Python3

See below example

import numpy as np
data=np.array(['a0','a1,a2','a2,a3'])
pk=np.array([1,2,3])
df=pd.DataFrame({'data':data,'PK':pk})
df

enter image description here

df['data'].apply(lambda x : pd.Series(str(x).split(","))).stack()

enter image description here

What I need is:

data  pk
 a0   1
 a1   2
 a2   2
 a2   3
 a3   3

Is there any way to achieve this without merge and resetting indexes as mentioned here?


Solution

  • Using the Explode is the keyword (thx to wwnde for pointing it out) for searching this and can be done easily in Python with using existing libraries

    First step is converting the column with a delimiter to a list

    df=df.assign(Data=df.data.str.split(","))
    

    enter image description here

    and then explode

    df.explode('Data')
    

    enter image description here

    if you are reading from Excel and Pandas detect a list of number as int and if you need to do the explode multiple times then this is the code and results

    enter image description here