Search code examples
pythonpandasbinningcategorization

How to efficiently categorise dataframe rows based on a range of values?


Hello i am relatively new to Python, i have a pandas dataframe looking similar to the one seen below, which i have imported from a txt file:

UnitNamePCU  Timestamp           PowerREF  PowerPCU         
F09       2014-11-01 00:20:00     60.73     79.64   
F09       2014-11-01 00:30:00     63.15     84.06   
F09       2014-11-01 01:00:00    101.06     84.03   
F09       2014-11-01 01:10:00    122.79    104.29   
F09       2014-11-01 01:20:00     86.57    110.08   
F09       2014-11-01 01:40:00     16.74     27.32   
F09       2014-11-01 01:50:00     12.59     21.69   
F09       2014-11-01 02:20:00     17.42     13.98

My question is how can i sort the data in columns PowerREF and PowerPCU into "bins" saying something like: bin1 = PowerREF < 50, bin2 = 50 < PowerREF < 100. I know i can do something like:

bin1 = [x for x in data.PowerREF if x < 50]
bin2 = [x for x in data.PowerREF if x > 50 and x < 100]

but since the "bin" intervals can change maybe next time its bin1 = x < 100, bin2 = 100 < x < 200. I know the bin intervals before hand and the maximum value of the last bin the limit. So i think i should be able to do something with:

PowerLim = 1500
binInterval = 50
bin = range(0, PowerLim+1, binInterval)

I just dont know how to incorporate it smart and efficiently into the binnin process.


Solution

  • IIUC, you can use pd.cut here:

    power_lim = 1500
    bins = np.arange(0, power_lim + 1, 50)
    
    pd.cut(df.PowerREF, bins=bins)
    
    F09     (50, 100]
    F09     (50, 100]
    F09    (100, 150]
    F09    (100, 150]
    F09     (50, 100]
    F09       (0, 50]
    F09       (0, 50]
    F09       (0, 50]
    Name: PowerREF, dtype: category
    Categories (30, interval[int64]): [(0, 50] < (50, 100] < (100, 150] < (150, 200] < ... <
                                       (1300, 1350] < (1350, 1400] < (1400, 1450] < (1450, 1500]]