Search code examples
pythonpandasbinning

How do I efficiently bin values into overlapping bins using Pandas?


I would like to bin all the values from a column of type float into bins that are overlapping. The resulting column could be a series of 1-D vectors with bools - one vector for each value from the original column. The resulting vectors contain True for each bin a value falls into and False for the other bins.

For example, if I have four bins [(0, 10), (7, 20), (15, 30), (30, 60)], and the original value is 9.5, the resulting vector should be [True, True, False, False].

I know how to iterate through all the ranges with a custom function using 'apply', but is there a way to perform this binning more efficiently and concisely?


Solution

  • Would a simple list comprehension meet your needs?

    Bins = [(0, 10), (7, 20), (15, 30), (30, 60)]
    Result = [((9.5>=y[0])&(9.5<=y[1])) for y in Bins]
    

    If your data is stored in column data of a pandas DataFrame (df) then you can define the function:

    def in_ranges(x,bins):
        return [((x>=y[0])&(x<=y[1])) for y in bins]
    

    and apply it to the column:

    df[data].apply(lambda x: pd.Series(in_ranges(x,Bins),Bins))