Search code examples
pythonpandasassign

a more pythonic way to split a column in multiple columns and sum two of them


Sample code:

import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]]})

Goal:

df = pd.DataFrame({'id': [1, 2, 3], 'bbox': [[1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0], [9.0, 10.0, 11.0, 12.0]], 'x1': [1, 5, 9], 'y1': [2, 6, 10], 'x2': [4, 12, 20], 'y2': [6, 14, 22]})

In words, I want to add four integer columns to the dataframe, where the first two are just the first two elements of each list in bbox, and the last two are respectively the sum of the first and third element of each list, and the sum of the second and fourth one. Currently, I do this:

df[['x1', 'y1', 'w', 'h']] = pd.DataFrame(df['bbox'].values.tolist(), index=df.index).astype(int)
df.assign(x2 = df['x1']+df['w'], y2 = df['y1']+df['h'])
df.drop(['w', 'h'], axis = 1) 

It seems a bit convoluted to me. Isn't there a way to avoid creating the intermediate columns w and h, or would it make the code less readable? Readability is an higher priority for me than saving one code line, thus if there are no readable alternatives, I'll settle for this solution.


Solution

  • I think you can create x2 and y2 in first step:

    df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
    df[['x1', 'y1', 'x2', 'y2']] = df1
    df = df.assign(x2 = df['x1']+df['x2'], y2 = df['y1']+df['y2'])
    
    print (df)
       id                     bbox  x1  y1  x2  y2
    0   1     [1.0, 2.0, 3.0, 4.0]   1   2   4   6
    1   2     [5.0, 6.0, 7.0, 8.0]   5   6  12  14
    2   3  [9.0, 10.0, 11.0, 12.0]   9  10  20  22
    

    Or use +=:

    df1 = pd.DataFrame(df['bbox'].values.tolist(),index=df.index).astype(int)
    df[['x1', 'y1', 'x2', 'y2']] = df1
    df['x2'] += df['x1']
    df['y2'] += df['y1']