Search code examples
python-3.xpandasbinning

Binning with pd.Cut Beyond range(replacing Nan with "<min_val" or ">Max_val" )


df= pd.DataFrame({'days': [0,31,45,35,19,70,80 ]})
df['range'] = pd.cut(df.days, [0,30,60])    
df

Here as code is reproduced , where pd.cut is used to convert a numerical column to categorical column . pd.cut usually gives category as per the list passed [0,30,60]. In this row's 0 , 5 & 6 categorized as Nan which is beyond the [0,30,60]. what i want is 0 should categorized as <0 & 70 should categorized as >60 and similarly 80 should categorized as >60 respectively, If possible dynamic text labeling of A,B,C,D,E depending on no of category created. Expected Output


Solution

  • For the first part, adding -np.inf and np.inf to the bins will ensure that everything gets a bin:

    In [5]: df= pd.DataFrame({'days': [0,31,45,35,19,70,80]})
       ...: df['range'] = pd.cut(df.days, [-np.inf, 0, 30, 60, np.inf])
       ...: df
       ...:
    Out[5]:
       days         range
    0     0   (-inf, 0.0]
    1    31  (30.0, 60.0]
    2    45  (30.0, 60.0]
    3    35  (30.0, 60.0]
    4    19   (0.0, 30.0]
    5    70   (60.0, inf]
    6    80   (60.0, inf]
    

    For the second, you can use .cat.codes to get the bin index and do some tweaking from there:

    In [8]: df['range'].cat.codes.apply(lambda x: chr(x + ord('A')))
    Out[8]:
    0    A
    1    C
    2    C
    3    C
    4    B
    5    D
    6    D
    dtype: object