For example have this table:
list_1=[['1','y',474.0],
['2','n',482.0],
['3','n',564.0,],
['4','y',549.0,],
['5','y',551.0,],
['6','y',555.0,],
['7','n',600.0,],
['8','y',357.0,],
['9','y',542.0,],
['10','n',462.0,],
['11','n',513.0,],
['12','y',526.0,]]
labels=['id','order_?','hours_spend']
df=pd.DataFrame(list_1,columns=labels)
df
Result:
id order_? hours_spend
0 1 y 474.0
1 2 n 482.0
2 3 n 564.0
3 4 y 549.0
4 5 y 551.0
5 6 y 555.0
6 7 n 600.0
7 8 y 357.0
8 9 y 542.0
9 10 n 462.0
10 11 n 513.0
11 12 y 526.0
I'm going to divide the hours_spend column to 3 groups with NTILE method without lables:
df['ntile']=pd.qcut(df['hours_spend'],3)
df
Result:
id order_? hours_spend ntile
0 1 y 474.0 (356.999, 502.667]
1 2 n 482.0 (356.999, 502.667]
2 3 n 564.0 (549.667, 600.0]
3 4 y 549.0 (502.667, 549.667]
4 5 y 551.0 (549.667, 600.0]
5 6 y 555.0 (549.667, 600.0]
6 7 n 600.0 (549.667, 600.0]
7 8 y 357.0 (356.999, 502.667]
8 9 y 542.0 (502.667, 549.667]
9 10 n 462.0 (356.999, 502.667]
10 11 n 513.0 (502.667, 549.667]
11 12 y 526.0 (502.667, 549.667]
Now i have column 'ntile' with data type 'category':
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 4 columns):
id 12 non-null object
order_? 12 non-null object
hours_spend 12 non-null float64
ntile 12 non-null category
dtypes: category(1), float64(1), object(2)
memory usage: 556.0+ bytes
How I can add new column with mean of range in 'ntile' column ??
You can simply define it as:
df['mean_ntile'] = (df['ntile'].apply(lambda x: x.left).astype(int) + df['ntile'].apply(lambda x: x.right).astype(int))/2
print(df)
Output:
id order_? hours_spend ntile mean_ntile
0 1 y 474.0 (356.999, 502.667] 429.0
1 2 n 482.0 (356.999, 502.667] 429.0
2 3 n 564.0 (549.667, 600.0] 574.5
3 4 y 549.0 (502.667, 549.667] 525.5
4 5 y 551.0 (549.667, 600.0] 574.5
5 6 y 555.0 (549.667, 600.0] 574.5
6 7 n 600.0 (549.667, 600.0] 574.5
7 8 y 357.0 (356.999, 502.667] 429.0
8 9 y 542.0 (502.667, 549.667] 525.5
9 10 n 462.0 (356.999, 502.667] 429.0
10 11 n 513.0 (502.667, 549.667] 525.5
11 12 y 526.0 (502.667, 549.667] 525.5
As @ALlolz suggested an easier way is:
df['mean_ntile'] = df['ntile'].apply(lambda x: x.mid)