Bin variable with pre-defined bins and closed/open intervals

I have a set of bins which can be defined by:

A set of tuples of non-overlaping boundaries of each bin:

Intervals: [(0,1), (1,2), (3,4)]
A set of indicators identifying which boundary of each tuple is closed:

Closed_Boundaries: ['right','right','both']
A set of labels for each interval

Labels: ['first','second','third']

I am looking for an efficient, elegant and scalable way of applying this binning to a numeric series in a pandas dataframe so that the result includes the respective labels as identified by the binning logic:

Data_input: [3.5, 1, 0.5, 3]

Data_result: ['third', 'first', 'first', 'third']

I tried with pandas.IntervalIndex.from_tuples() followed by pandas.cut(). However, the labels argument of pandas.cut() is disabled when using IntervalIndex.from_tuples() while the name argument of the latter doesn't allow me to set labels to be used as replacement values.

PS: The pandas problem with labels not supported with IntervalIndex has been discussed here.

Solution

If all intervals are closed on the same side

The simplest way is to use pd.cut. However, there is an outstanding bug where it ignores labels when bins is IntervalIndex.

def cut(array, bins, labels, closed='right'):
    _bins = pd.IntervalIndex.from_tuples(bins, closed=closed)

    x = pd.cut(array, _bins)
    x.categories = labels # workaround for the bug
    return x

array = [3.5, 1, 0.5, 3]
bins = [(0,1), (1,2), (3,4)]
labels = ['first', 'second', 'third']

df = pd.DataFrame({
    'value': array,
    'category': cut(array, bins, labels, closed='right')
})

Output:

   value category
0    3.5    third
1    1.0    first
2    0.5    first
3    3.0      NaN

If every interval is different

Things get a lot slower because the code is not vectorized but it's conceptually simple: for each item in the array, find the first bin that it falls into and add the label of that bin.

def cut(array, bins, labels):
    intervals = [pd.Interval(*b) for b in bins]

    categories = []
    for value in array:
        cat = None
        for i, interval in enumerate(intervals):
            if value in interval:
                cat = labels[i]
                break
        categories.append(cat)

    return categories

cut([3.5, 1, 0.5, 3], bins=[(0,1,'right'),(1,2,'right'),(3,4,'left')], labels=['first', 'second', 'third'])

I modified the bin tuples to include what side they are closed on. Options are left, right, both and neither.