Search code examples
pythonpandasfillna

Fillna by relating multiple columns using a function


I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.

Here is the dataframe:

    object    id  price
0   laptop   24A   30
1   laptop   37C   NaN
2   laptop   21O   NaN
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   NaN
6   laptop   05E   20
7   laptop   29B   NaN
8   laptop   22M   10
9   laptop   62F   NaN
10  laptop   23G   20
11  laptop   61O   NaN
12  laptop   27A   NaN

Expected output:

    object    id  price
0   laptop   24A   30
1   laptop   37C   50
2   laptop   21O   10
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   10
6   laptop   05E   20
7   laptop   29B   30
8   laptop   22M   10
9   laptop   62F   20
10  laptop   23G   20
11  laptop   61O   10
12  laptop   27A   50

Solution

  • You can use np.select with str.contains conditions:

    conditions = {
        30: df.id.str.contains('[^7][ABC]$'),
        50: df.id.str.contains('7[ABC]$'),
        20: df.id.str.contains('[EFG]$'),
        10: df.id.str.contains('[OMN]$'),
    }
    df.price = np.select(conditions.values(), conditions.keys())
    
    #     object   id  price
    # 0   laptop  24A     30
    # 1   laptop  37C     50
    # 2   laptop  21O     10
    # 3   laptop  17C     50
    # 4   laptop  55A     30
    # 5   laptop  34N     10
    # 6   laptop  05E     20
    # 7   laptop  29B     30
    # 8   laptop  22M     10
    # 9   laptop  62F     20
    # 10  laptop  23G     20
    # 11  laptop  61O     10
    # 12  laptop  27A     50
    

    You could also use loc masking if you want to use fillna:

    for price, condition in conditions.items():
        df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)
    

    Update 1

    If you want to further restrict by df.object, you can add the df.object condition with &:

    conditions = {
        30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
        50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
        20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
        10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
        1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
    }
    

    Update 2

    If you really want to use a function, you can apply along rows (axis=1), but row-apply is much slower and not advised when you have vectorized options like np.select:

    def price(row):
        result = np.nan
        if row.object == 'laptop':
            if row.id[-2:] in ['7A', '7B', '7C']:
                result = 50
            elif row.id[-1] in list('ABC'):
                result = 30
            elif row.id[-1] in list('EFG'):
                result = 20
            elif row.id[-1] in list('OMN'):
                result = 10
        elif row.object == 'phone':
            if row.id[-2:] in ['7A', '7B', '7C']:
                result = 5000
            ...
        return result
    df.price = df.apply(price, axis=1)