Search code examples
pythonlistlabelcut

Pair consecutive elements in a pd.cut-list to new histogram label-list?


I want to set the labels of a binned histogram automatically based on the cut-intervals. The data-bins are created by applying pd.cut() on a dataframe. The list of the pd.cut is specified manually (see cut list), but I want the histogram labels to be set automatically based on the cut-list. How do I convert the cut-list to a label list using code?

#cut list
cut = [0,20,40,60,80,100]

#desired label list
label = ['[0-20]', ']20-40]', ']40-60]', ']60-80]', ']80-100]']

#to be used for:
pd_cut = pd.cut(df, cut, labels=label, include_lowest=True).astype(str)

Solution

  • You can use zip to go through the pairs, and keep updating the list label:

    cut = [0,20,40,60,80,100]
    
    label = []
    
    for i, p in enumerate(zip(cut, cut[1:])):
      ob = '[' if i == 0 else ']'
      label.append('{}{}-{}]'.format(ob, *p))
    
    print(label)
    

    Output:

    ['[0-20]', ']20-40]', ']40-60]', ']60-80]', ']80-100]']
    

    Besides zip, enumerate, and the slicing, you can use a classic for loop with range and len:

    for i in range(len(cut) - 1):
      ob = '[' if i == 0 else ']'
      label.append('{}{}-{}]'.format(ob, cut[i], cut[i + 1]))