Search code examples
pythonpandasdataframefunctionboolean-expression

Create new data frame columns with return in function


In the DataFrame below, I need to check the shop column and add two columns based on shop

datetime dtnum time tnum shop
02-03-2022 05:25 20220302052500 05:25:00 52500 PRESS1
02-03-2022 05:26 20220302052600 05:26:00 52600 BODY1
02-03-2022 05:27 20220302052700 05:27:00 52700 BODY2

I tried with below code. that is using same return to make two columns

# Apply shop_groupcode
def shop_chk(column):
    if column['shop'] == 'PRESS1' or 'PRESS1_A' or 'E176' or 'E177' or 'E184' or 'E185' or 'E186' or 'E187':
        return 1, 1  # shop code , plant code
    if column['shop'] == 'BODY1' or 'BODY1_A' or 'E179' or 'E180' or 'E181' or 'E208' or \
            'E216' or 'E217' or 'E218' or 'E232':
        return 2, 1 # shop code , plant code
    if column['shop'] == 'BODY2' or 'BODY2_A' or 'E196' or 'E197' or 'E198' or 'E199' or 'E200':
        return 8, 2 # shop code , plant code

df['shop_code', 'plant_code'] = df.apply(shop_chk, axis=1, result_type="expand")

Code runs without error, two columns are created- but the column values are all 1

Requirement: I want to know if there is mistake in code or if there is any efficient method as I have some more shop conditions to check

datetime dtnum time tnum shop shop_code plant_code
02-03-2022 05:25 20220302052500 05:25:00 52500 PRESS1 1 1
02-03-2022 05:26 20220302052600 05:26:00 52600 BODY1 2 1
02-03-2022 05:27 20220302052700 05:27:00 52700 BODY2 8 2

Solution

  • You can use isin instead of multiple ORs and store those conditions in a list and use numpy.select:

    import numpy as np
    conditions = [df['shop'].isin(['PRESS1','PRESS1_A','E176','E177','E184','E185','E186','E187']), 
                  df['shop'].isin(['BODY1','BODY1_A','E179','E180','E181','E208','E216','E217','E218','E232']),
                  df['shop'].isin(['BODY2','BODY2_A','E196','E197','E198','E199','E200'])
                 ]
    df['shop_code'] = np.select(conditions, [1, 2, 8])
    df['plant_code'] = np.select(conditions, [1, 1, 2])
    

    Output:

               datetime           dtnum      time   tnum    shop  shop_code  plant_code  
    0  02-03-2022 05:25  20220302052500  05:25:00  52500  PRESS1          1           1  
    1  02-03-2022 05:26  20220302052600  05:26:00  52600   BODY1          2           1
    2  02-03-2022 05:27  20220302052700  05:27:00  52700   BODY2          8           2
    

    FYI, the correct syntax is:

    (column['shop'] == 'PRESS1') or (column['shop'] == 'PRESS1_A') or ...
    

    because

    column['shop'] == 'PRESS1' or 'PRESS1_A' or ...
    

    returns either True or 'PRESS1_A' (not a truth-value).