Search code examples
pythonpandasnumbers

Bining a set of numbers in a specific way


I want to split a set of numbers from 100 (inclusive) to 200 (inclusive) in bins. The numbers are needed to be split in those intervals: [100, 135), [135, 160), [160, 175), [175, 190), [190, 200]. Unfortunately, for now, I have not found a function that solves my problem perfectly

I have tried pd.cut function with a right parameter that was set to False, but the output of all possible intervals was: [100, 135), [135, 160), [160, 175), [175, 190), [190, 200]. The difference is that I need to have last interval to include 200 (so [190, 200], not [190, 200)).


Solution

  • Example

    import pandas as pd
    s = pd.Series(range(1000, 2004)).div(10)
    

    s:

    0       100.0
    1       100.1
    2       100.2
    3       100.3
    4       100.4
            ...  
    999     199.9
    1000    200.0 <-- exactly 200
    1001    200.1
    1002    200.2
    1003    200.3
    Length: 1004, dtype: float64
    

    Code

    How about additionally processing the case where value is exactly 200 with boolean masking in the result of the pd.cut function?

    bins=[100, 135, 160, 175, 190, 200]
    labels=['[100, 135)', '[135, 160)', '[160, 175)', '[175, 190)', '[190, 200]']
    cond = s.eq(200)
    out = pd.cut(s, bins=bins, labels=labels, right=False).mask(cond, '[190, 200]')
    

    out:

    0       [100, 135)
    1       [100, 135)
    2       [100, 135)
    3       [100, 135)
    4       [100, 135)
               ...    
    999     [190, 200]
    1000    [190, 200] <-- exactly 200
    1001           NaN
    1002           NaN
    1003           NaN
    Length: 1004, dtype: category
    Categories (5, object): ['[100, 135)' < '[135, 160)' < '[160, 175)' < '[175, 190)' < '[190, 200]']