Search code examples
pythonpandascutcategorical-data

How to convert the continuous numbers into categorical using pandas?


I am searching for solution for this question for few days but unfortunately no success.

I have continuous values in a column like this:

Val: 1, 15, 2, 91, 52, 126

I need to convert these numbers into categories as intervals. For example, the first number should lie in category (1-10)

I know we can define interval and convert the data using pd.cut

pd.cut(df.val, right=False)

but my problem is that I can't define interval as I have millions of value.

The ideal solution would be that I can define the range of interval and then it automatically search the values and convert it in that category.

This would be my ideal output:

Val     Val_Cat
1        1-10
15       10-20
2        1-10
91       90-100
52       50-60
126      120-130

Solution

  • One idea is use maths with integer division by // by 10, then multiple by 10 and last convert to strings (with repalce if necessary):

    s = df['Val'] // 10 * 10
    df['new'] = s.replace(0, 1).astype(str) + '-' + (s + 10).astype(str)
    print (df)
       Val  Val_Cat      new
    0    1     1-10     1-10
    1   15    10-20    10-20
    2    2     1-10     1-10
    3   91   90-100   90-100
    4   52    50-60    50-60
    5  126  120-130  120-130
    

    Alternative with f-strings:

    df['new'] = df['Val'].map(lambda x: f'{x//10*10}-{(x//10*10)+10}')
    print (df)
       Val  Val_Cat      new
    0    1     1-10     0-10
    1   15    10-20    10-20
    2    2     1-10     0-10
    3   91   90-100   90-100
    4   52    50-60    50-60
    5  126  120-130  120-130
    

    Your solution with cut is possible change by:

    bins = np.arange(0, df['Val'].max() // 10 * 10 + 20, 10)
    
    df['new'] = pd.cut(df.Val, bins = bins, right=False)
    print (df)
       Val  Val_Cat         new
    0    1     1-10     [0, 10)
    1   15    10-20    [10, 20)
    2    2     1-10     [0, 10)
    3   91   90-100   [90, 100)
    4   52    50-60    [50, 60)
    5  126  120-130  [120, 130)