Search code examples
pythonpandasdataframeextractmultiple-columns

Create df column based on other columns in df


Create len column based on num_type_len by matching with num on two columns.

num_type_len Actual_num
[8812_CHECKING_90, 7094_SAVINGS_75, 9939_CHECKING_89] 7094
[6846_CHECKING_87, 1906_CHECKING_90] 1906

Expected output:-


| Report_length | Actual_num |
| ------------- | ---------- |
| 75            | 7094       |
| 90            | 1906       |


Solution

  • You can compare splitted values of lists by Actual_num converted to strings and get first match values by next with iter trick:

    df['Report_length'] = [next(iter([z.split('_')[-1] 
                           for z in x if z.split('_')[0] == str(y)]), None) 
                           for x, y in zip(df['num_type_len'], df['Actual_num'])]
    
    df = df[['Report_length','Actual_num']]
    print (df)
      Report_length  Actual_num
    0            75        7094
    1            90        1906
    

    Or use DataFrame.explode with lists column with compare splitted values:

    df1 = df.explode('num_type_len')
    
    df2 = (df1['num_type_len'].str.split('_', expand=True)
                              .rename(columns={2:'Report_length'})
                              .assign(Actual_num = df1['Actual_num']))
    
    df = df2.loc[df2[0].eq(df2['Actual_num'].astype(str)), ['Report_length', 'Actual_num']]
    print (df)
      Report_length  Actual_num
    0            75        7094
    1            90        1906