Search code examples
pythonpandasbins

Pandas cut and specifying specific bin sizes


I have just been playing with cut and specifying specific bin sizes but sometimes I was getting incorrect data in my bins.

What I want to do is bin data depending on where it falls in my Risk Impact matrix.

Risk Evaluation Matrix

Below is the original code I used to create my dataFrame and allocate my bins and labels.

risk = {'Gross Risk': {0: 1, 1: 3, 2: 4, 3: 6, 4: 9, 5: 14, 6: 20, 7: 5, 8: 8, 9: 16, 10: 22, 11: 1, 12: 3, 13: 6, 14: 9, 15: 12, 16: 25}}
df = pd.DataFrame.from_dict(risk)

# Create a list of names we will call the bins
group_names = ['Very Low', 'Low', 'Medium', 'High', 'Very High']

# Specify our bin boundaries
evaluation_bins = [1, 4, 8, 12, 20, 25]

# And stitch it all together
df['Risk Evaluation'] = pd.cut(df['Gross Risk'], bins = evaluation_bins, labels = group_names, include_lowest = True)

This creates the following output

Dataframe output after applying bins

Checking this off against my Risk Matrix I can see that rows 6, 7, 8 and 15 are incorrectly segregated.

To fix this I then respecified the Evaluation_Bins data. Instead of taking the lower bound of the bin, I specified the upper bound.

evaluation_bins = [1, 3, 6, 10, 16, 25]

This gives me the results I want. But either I fundamentally misunderstood how to specify the Pandas cut boundaries, I thought I had to specify the lower limit, not the higher, or I have just fluked my way through to the result I wanted.

Correct result

Could someone put my mind at rest and let me know if I miss understood how the creation of fixed width bins worked or not?

Be nice :)


Solution

  • Change evaluation_bins and use right=False as parameter of pd.cut:

    evaluation_bins = [1, 4, 8, 12, 20, np.inf]
    
    df['Risk Evaluation2'] = pd.cut(df['Gross Risk'], bins=evaluation_bins, 
                                    labels=group_names, include_lowest=True, right=False)
    print(df)
    
    # Output
        Gross Risk Risk Evaluation Risk Evaluation2
    0            1        Very Low         Very Low
    1            3        Very Low         Very Low
    2            4        Very Low              Low
    3            6             Low              Low
    4            9          Medium           Medium
    5           14            High             High
    6           20            High        Very High
    7            5             Low              Low
    8            8             Low           Medium
    9           16            High             High
    10          22       Very High        Very High
    11           1        Very Low         Very Low
    12           3        Very Low         Very Low
    13           6             Low              Low
    14           9          Medium           Medium
    15          12          Medium             High
    16          25       Very High        Very High