Search code examples
pythonpandasexplode

Pandas 0.19.0 explode() workaround


Good Day everyone!

I need help with alternatives or a workaround for explode() in pandas 0.19.0 I have this csv files

  item        CODE
0 apple       REDGRNYLW
1 strawberry  REDWHT
2 corn        YLWREDPRLWHTPNK

I need to get this result

  item        CODE
1 apple       RED
2 apple       GRN
3 apple       YLW
4 strawberry  RED
5 strawberry  WHT
6 corn        YLW
7 corn        RED
8 corn        PRL
9 corn        WHT
10 corn       PNK

I managed to get the result using pandas 1.3.3, here is what I did

import pandas as pd

filename = r'W:\plant_CODE.csv'

df2 = pd.read_csv(filename)

def split_every_3_char(string):
    return [string[i:i+3] for i in range(0, len(string), 3)]

df2.columns = ['item', 'CODE']
df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
    .CODE.apply(lambda x: split_every_3_char(x))
    .explode()
    .to_frame()
    .reset_index()
)

print(df_splitted)

Unfortunately, I just realized that I'm limited to pandas 0.19.0 and explode() isn't yet available.

Traceback (most recent call last):
   File "<string>", line 69, in <module>
   File "lib\site-packages\pandas\core\generic.py", line 2744, in __getattr__
 AttributeError: 'Series' object has no attribute 'explode'

I would appreciate any solution or workaround. Thank you!

csv_file


Solution

  • Convert ouput of function to Series and use DataFrame.stack:

    df_splitted = (df2.set_index(df2.columns.drop('CODE', 1).tolist())
        .CODE.apply(lambda x: pd.Series(split_every_3_char(x)))
        .stack()
        .reset_index(-1, drop=True)
        .reset_index(name='CODE')
    )
    
    print(df_splitted)
             item CODE
    0       apple  RED
    1       apple  GRN
    2       apple  YLW
    3  strawberry  RED
    4  strawberry  WHT
    5        corn  YLW
    6        corn  RED
    7        corn  PRL
    8        corn  WHT
    9        corn  PNK