I am trying to write a code that creates bins from a dataframe(account_raw) that contains blank values. My problem is that python bins blank values with my first bin label: 0 - 25k. What I want ot do is to create a separate bin for blank values.Any ideas how to fix this? Thanks
Bucket = [0, 25000, 50000, 100000,
200000, 300000, 999999999999]
Label = ['0k to 25k', '25k - 50k',
'50k - 100k', '100k - 200k',
'200k - 300k', 'More than 300k']
account_raw['LoanGBVBuckets'] = pd.cut(
account_raw['IfrsBalanceEUR'],
bins=ls_LoanGBVBucket,
labels=ls_LoanGBVBucketLabel,
include_lowest=True).astype(str)
I think simpliest is processing values after pd.cut
and set custom catagory for missing values by IfrsBalanceEUR
column:
account_raw['LoanGBVBuckets'] = pd.cut(account_raw['IfrsBalanceEUR'],
bins=ls_LoanGBVBucket,
labels=ls_LoanGBVBucketLabel,
include_lowest= True).astype(str)
account_raw.loc[account_raw['IfrsBalanceEUR'].isna(), 'LoanGBVBuckets'] = 'missing values'
EDIT:
Tested in pandas 0.25.0 and for missing values get NaN
s in output, for replace them some category first is necessary cat.add_categories
and then fillna
:
account_raw = pd.DataFrame({'IfrsBalanceEUR':[np.nan, 100, 100000]})
Bucket = [0, 25000, 50000, 100000, 200000, 300000, 999999999999]
Label = ['0k to 25k', '25k - 50k', '50k - 100k',
'100k - 200k', '200k - 300k', 'More than 300k']
account_raw['LoanGBVBuckets'] = pd.cut(account_raw['IfrsBalanceEUR'],
bins=Bucket,
labels=Label,
include_lowest= True)
print (account_raw)
IfrsBalanceEUR LoanGBVBuckets
0 NaN NaN
1 100.0 0k to 25k
2 100000.0 50k - 100k
account_raw['LoanGBVBuckets']=(account_raw['LoanGBVBuckets'].cat
.add_categories('missing values')
.fillna('missing values'))
print (account_raw)
IfrsBalanceEUR LoanGBVBuckets
0 NaN missing values
1 100.0 0k to 25k
2 100000.0 50k - 100k