Search code examples
pythonexcelpandasnumpysplit

How to split columns into two sub columns under their parent column using python in excel


Need suggestion in python. I have an excel sheet with 20 columns having values like 12-->23 i need to split all the columns into two pre and post where pre will have value 12 and post will have value 23 and so on ... and also these column should be exactly under their respective parent column using pandas

Input:

| Column A | 
| -------- | 
| 12 --> 23|
| 13 --> 24|

Output

| column A |
|pre| |post|
| 12| | 23 |
| 13| | 24 |

There are lots of column so i can't directly use column name

I tried the following code it worked if i know the column name but not if i have to loop the columns without their name

df = pd.read_excel('path/to/excel_file.xlsx')

for col in df.columns:
    new_cols = df[col].str.split(expand=True)
    
    df[col + '_1'] = new_cols[0]
    df[col + '_2'] = new_cols[1]
df.drop(columns=df.columns[:-2], inplace=True)

It's not working for me


Solution

  • You can use str.split:

    df[['pre', 'post']] = df['Column A'].str.split(r'\s*-->\s*', expand=True)
    print(df)
    
    # Output
        Column A pre post
    0  12 --> 23  12   23
    1  13 --> 24  13   24
    

    So for multiple columns:

    data = []
    
    for col in df.columns:
        new_cols = df[col].str.split(r'\s*-->\s*', expand=True)
        if new_cols.shape[1] == 2:
            df = df.drop(columns=col)
            data.append(new_cols.add_prefix(col + '_'))
    df = pd.concat([df, *data], axis=1)
    

    Output:

    >>> df
      Column A_0 Column A_1
    0         12         23
    1         13         24