I have a pandas dataframe with different number of integers and NaNs
in each row. I would like to allocate values in each row into 8 bins - 4 bins for negative values and 4 bins for positive values per row. So, there will be different number of values in each bin per row. Any hints on how to adjust qcut
function for that? Thanks!
If I understand correctly, you could just do a qcut
on positive values and a qcut
on negative values.
For example, given the dataframe:
>>> df
vals
0 -0.456460
1 0.448368
2 0.186750
3 1.056617
4 -0.035620
5 -0.609843
6 0.126376
7 0.160817
8 -1.495441
9 0.730763
10 -0.005071
11 0.677918
12 -0.779553
13 0.717374
14 2.250258
15 -0.801028
16 0.306408
17 0.538970
18 -2.120528
19 1.066903
Use 2 qcuts
, one for positive and one for negative.
df.loc[df.vals > 0,'bin'] = pd.qcut(df.loc[df.vals > 0,'vals'], q=4)
df.loc[df.vals < 0,'bin'] = pd.qcut(df.loc[df.vals < 0,'vals'], q=4)
And as a result, they are binned into 8 unique bins, 4 for positive and 4 for negative:
>>> df
vals bin
0 -0.456460 (-0.695, -0.351]
1 0.448368 (0.276, 0.608]
2 0.186750 (0.125, 0.276]
3 1.056617 (0.812, 2.25]
4 -0.035620 (-0.351, -0.00507]
5 -0.609843 (-0.695, -0.351]
6 0.126376 (0.125, 0.276]
7 0.160817 (0.125, 0.276]
8 -1.495441 (-2.122, -0.975]
9 0.730763 (0.608, 0.812]
10 -0.005071 (-0.351, -0.00507]
11 0.677918 (0.608, 0.812]
12 -0.779553 (-0.975, -0.695]
13 0.717374 (0.608, 0.812]
14 2.250258 (0.812, 2.25]
15 -0.801028 (-0.975, -0.695]
16 0.306408 (0.276, 0.608]
17 0.538970 (0.276, 0.608]
18 -2.120528 (-2.122, -0.975]
19 1.066903 (0.812, 2.25]
You can sort the bins to visualize them like this, allowing you to see 4 bins for positive values and 4 bins for negative values:
np.sort(df['bin'].unique())
array([Interval(-2.1219999999999999, -0.97499999999999998, closed='right'),
Interval(-0.97499999999999998, -0.69499999999999995, closed='right'),
Interval(-0.69499999999999995, -0.35099999999999998, closed='right'),
Interval(-0.35099999999999998, -0.0050699999999999999, closed='right'),
Interval(0.125, 0.27600000000000002, closed='right'),
Interval(0.27600000000000002, 0.60799999999999998, closed='right'),
Interval(0.60799999999999998, 0.81200000000000006, closed='right'),
Interval(0.81200000000000006, 2.25, closed='right')], dtype=object)