Search code examples
pythonpandasdatabasedataframedata-cleaning

Pandas Parsing by cell


I have a dataframe with n columns and n rows. Some of the cells contain multiple values seperated by ";" I can't figure out how to run through every cell in the dataframe and if I encounter this to sperate the cell into multiple cells.

Example of the problem I am encountering

The image above is in a google sheet but I need a solution for a pandas dataframe.

A appreciate any help in advance thank you :)


Solution

  • df:

    1 2 3 4
    a;b;d; a;b g;a a
    c;f f e g
    e d
    

    Try to do it via explode:

    exploded = [df[col].str.rstrip(';').str.split(';').explode().reset_index(drop=True) for col in df.columns]
    df2 = pd.DataFrame(dict(zip(df.columns, exploded)))
    

    df2:

        1   2   3   4
    0   a   a   g   a
    1   b   b   a   g
    2   d   f   e   None
    3   c   d   None    NaN
    4   f   NaN NaN NaN
    5   e   NaN NaN NaN