Search code examples
pythonpandascut

Create consecutive labels for boxplot


I want to set the labels of a binned boxplot automatically based on the cut-intervals. The data-bins are created by applying pd.cut() on a data frame. The list of the pd.cut is specified manually (see cut list), but I want the histogram labels to be set automatically based on the cut-list.

How do I convert the cut list to a label list using code? The code below gives me a division as:

['0-20', '20-40', '40-60', '60-80', '80-100']

However I would like it to be:

['0-20', '21-40', '41-60', '61-80', '81-100']

Code:

cut = [0,20,40,60,80,100]

for i, p in enumerate(zip(cut, cut[1:])):

    label.append('{}-{}'.format(cut[i], cut[i + 1]))
print(label)

Solution

  • You could just do the following:

    cut = [0,20,40,60,80,100]
    label = []
    
    for i, p in enumerate(zip(cut, cut[1:])):
        label.append('{}-{}'.format(p[0] + 1 if p[0] != 0 else p[0], p[1]))
    

    It will give you:

    label
    ['0-20', '21-40', '41-60', '61-80', '81-100']
    

    p will be 2 values (cut[i] and cut[i+1]) and you wish to add 1 to the first position unless it's the beginning (this is handled by the following condition in the code above ...1 if p[0] != 0 else p[0]).