Search code examples
pythonpandasdataframecalculated-columns

add columns by conditional


I have the following data frame:

import pandas as pd

data = {"A1": [22,25,27,35],
        "A2": [23,54,71,52],
        "A3": [21,27,26,31],
        "A1_L1": [21,27,26,31],
        "A1_L2": [21,27,26,31],
        "A2_L1": [21,27,26,31],
        "A3_L1": [21,27,26,31],
        "A3_L2": [21,27,26,31],
        "A3_L3": [21,27,26,31]
        }
df = pd.DataFrame(data, columns = ["A1", "A2","A3","A1_L1","A1_L2","A2_L1","A3_L1","A3_L2","A3_L3"])

I want to add automaticaly a new column for each column with an L in it. The information in this column will be the subtraction column A - A_L

For example,

A1_L1_new=A1-A1_L1 
A1_L2_new=A1-A1_L2  
A2_L1_new=A2-A2_L1
A3_L1_new=A3-A3_L1
A3_L2_new=A3-A3_L2
A3_L3_new=A3-A3_L3

Any help will be appreciated.


Solution

  • First, import library regex:

    import re
    

    Then, take columns' names as list.

    col_names = df.columns.tolist()
    
    1. For each column name, check if there is L in it.
    2. If it exists, separate with regex the A and the L.
    3. Then create the string based on these As and Ls.
    4. Finally, create the column with the string name we just created, and subtract the columns of the corresponding A and L.
    for i in col_names:
        if 'L' in i: #checks if name contains 'L'
            x = [x.group() for x in re.finditer(r'[A,L][1-9]', i)] # place in list, the 'A' and 'L'    
            foo = str(x[0]) + "-" + str(x[1]) + '_new' # Create new name, based on 'A' and 'L'
            df[foo] = df[x[0]] - df[i] # Subtract the corresponding columns.
    
    df
    

    In case you wonder what regex does:

    • [A,L] finds a match that starts with A or L,
    • [1-9] matches any digit. (you can also write instead \d, that matches any digit)