Search code examples
pythonpandasdataframedata-manipulation

Trying to dynamically filled columns based on list values and len


I have been trying to optimize as much as possible a data manipulation that takes two parts. I am working a pandas column where each value is a list with different length or NaN values and I want to extract the values to put them on column based on their position in the list.

Here is a reproductible example and what I have coded so far:

df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
               'last_name': ['George', 'George2', 'George3'],
               'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']]}
             )
len_banking_number = max(
    df.loc[
        df[
            'banking_number'
        ].notna()
    ]['banking_number'].str.len()
)
len_list = list(
    range(
        1, len_banking_number + 1
    )
)

for i in len_list:
  df[f'bank_{i}'] = np.nan

How would you get this result?

    df_final = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
                   'last_name': ['George', 'George2', 'George3'],
                   'banking_number': [['NaN'], ['UK421'], ['UK123', 'FR789']],
                   'bank_1': ['NaN', 'UK', 'UK'],
                   'bank_2': ['NaN', 'NaN', 'FR']}
                 )

Thank you for your time and help


Solution

  • You can try this:

    import pandas as pd
    
    df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
                   'last_name': ['George', 'George2', 'George3'],
                   'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
                 )
    
    # split the banking_number column
    df[['bank_1','bank_2']] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)
    
    # keep only the first 2 characters (maybe not needed but I wanted to match your expected output)
    def get_first_2_char(x):
        x=x[0:2] if x else x
        return x
    
    for col in ['bank_1', 'bank_2']:
        df[col] = df[col].apply(lambda x: get_first_2_char(x))
    

    output: output

    EDIT

    Following @itprorh86 remark, below is an updated version that can handle a different number of banks.

    import pandas as pd
    
    df = pd.DataFrame({'first_name': ['Charles', 'Charles2', 'Charles3'],
                   'last_name': ['George', 'George2', 'George3'],
                   'banking_number': [[None], ['UK421'], ['UK123', 'FR789']]}
                 )
    
    max_len = max(df['banking_number'].apply(lambda x: len(x)))
    column_names = [f'bank_{i}' for i in range(1, max_len+1)]
    
    # split the banking_number column
    df[column_names] = pd.DataFrame(df['banking_number'].tolist(), index= df.index)
    
    # keep only the first 2 characters
    def get_first_2_char(x):
        x=x[0:2] if x else x
        return x
    
    for col in column_names:
        df[col] = df[col].apply(lambda x: get_first_2_char(x))