I am searching for solution for this question for few days but unfortunately no success.
I have continuous values in a column like this:
Val: 1, 15, 2, 91, 52, 126
I need to convert these numbers into categories as intervals. For example, the first number should lie in category (1-10)
I know we can define interval and convert the data using pd.cut
pd.cut(df.val, right=False)
but my problem is that I can't define interval as I have millions of value.
The ideal solution would be that I can define the range of interval and then it automatically search the values and convert it in that category.
This would be my ideal output:
Val Val_Cat
1 1-10
15 10-20
2 1-10
91 90-100
52 50-60
126 120-130
One idea is use maths with integer division by //
by 10
, then multiple by 10
and last convert to strings (with repalce if necessary):
s = df['Val'] // 10 * 10
df['new'] = s.replace(0, 1).astype(str) + '-' + (s + 10).astype(str)
print (df)
Val Val_Cat new
0 1 1-10 1-10
1 15 10-20 10-20
2 2 1-10 1-10
3 91 90-100 90-100
4 52 50-60 50-60
5 126 120-130 120-130
Alternative with f-string
s:
df['new'] = df['Val'].map(lambda x: f'{x//10*10}-{(x//10*10)+10}')
print (df)
Val Val_Cat new
0 1 1-10 0-10
1 15 10-20 10-20
2 2 1-10 0-10
3 91 90-100 90-100
4 52 50-60 50-60
5 126 120-130 120-130
Your solution with cut is possible change by:
bins = np.arange(0, df['Val'].max() // 10 * 10 + 20, 10)
df['new'] = pd.cut(df.Val, bins = bins, right=False)
print (df)
Val Val_Cat new
0 1 1-10 [0, 10)
1 15 10-20 [10, 20)
2 2 1-10 [0, 10)
3 91 90-100 [90, 100)
4 52 50-60 [50, 60)
5 126 120-130 [120, 130)