Search code examples
pythonpandasdataframedictionarycalculated-columns

Apply pandas dictionnary with gt/lt conditions as keys


I have created the following pandas dataframe:

ds = {'col1':[1,2,2,3,4,5,5,6,7,8]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

   col1
0     1
1     2
2     2
3     3
4     4
5     5
6     5
7     6
8     7
9     8

I have then created a new field, called newCol, which has been defined as follows:

def criteria(row):
    if((row['col1'] > 0) & (row['col1'] <= 2)):
        return "A"
    elif((row['col1'] > 2) & (row['col1'] <= 3)):
        return "B"
    else:
        return "C"
    
df['newCol'] = df.apply(criteria, axis=1)

The new dataframe looks like this:

print(df)

   col1 newCol
0     1      A
1     2      A
2     2      A
3     3      B
4     4      C
5     5      C
6     5      C
7     6      C
8     7      C
9     8      C

Is there a possibility to create a dictionary like this:

dict = {
        
        '0 <= 2' : "A",
        '2 <= 3' : "B",
        'Else' : "C"

        }

And then apply it to the dataframe:

df['newCol'] = df['col1'].map(dict)

?

Can anyone help me please?


Solution

  • Yes, you could do this with IntervalIndex:

    dic = {(0, 2): 'A',
           (2, 3): 'B',
          }
    other = 'C'
    
    bins = pd.IntervalIndex.from_tuples(dic)
    labels = list(dic.values())
    
    df['newCol'] = (pd.Series(labels, index=bins)
                      .reindex(df['col1']).fillna(other)
                      .tolist()
                   )
    

    But given your example, it seems more straightforward to go with cut:

    df['newCol'] = pd.cut(df['col1'], bins=[0, 2, 3, np.inf], labels=['A', 'B', 'C'])
    

    Output:

       col1 newCol
    0     1      A
    1     2      A
    2     2      A
    3     3      B
    4     4      C
    5     5      C
    6     5      C
    7     6      C
    8     7      C
    9     8      C
    

    If you insist on your original dictionary format, you could convert using:

    dic = {'0 <= 2' : "A",
           '2 <= 3' : "B",
           'Else' : "C"
    }
    
    dic2 = {tuple(map(int, k.split(' <= '))): v for k, v in dic.items()
            if k != 'Else'}
    # {(0, 2): 'A', (2, 3): 'B'}
    other = dic['Else']