Search code examples
pythonpandasdataframecategoriesbinning

Mapping ranges of values in pandas dataframe


Apologies if this has been asked before, but I looked extensively without results.

import pandas as pd    
import numpy as np    
df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a'])    

   a
0  7
1  8
2  8
3  3
4  1
5  1
6  2
7  8
8  6
9  6

I'd like to create a new column b that maps several values of a according to some rule, say a=[1,2,3] is 1, a = [4,5,6,7] is 2, a = [8,9,10] is 3. one-to-one mapping is clear to me, but what if I want to map by a list of values or a range?

I tought along these lines...

df['b'] = df['a'].map({[1,2,3]:1,range(4,7):2,[8,9,10]:3})

Solution

  • There are a few alternatives.

    Pandas via pd.cut / NumPy via np.digitize

    You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.

    NumPy via np.select

    df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])
    
    criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
    values = [1, 2, 3]
    
    df['b'] = np.select(criteria, values, 0)
    

    The elements of criteria are Boolean series, so for lists of values, you can use df['a'].isin([1, 3]), etc.

    Dictionary mapping via range

    d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}
    
    df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))
    
    print(df)
    
       a  b  c
    0  1  1  1
    1  7  2  2
    2  5  2  2
    3  1  1  1
    4  3  1  1
    5  5  2  2
    6  4  2  2
    7  4  2  2
    8  9  3  3
    9  3  1  1