Search code examples
pandaslistbin

Apply binner to another columns in pandas


I have created the following pandas dataframe called df:

import pandas as pd
import numpy as np

ds = {'degreeCentraloty':[1,2,3,4,5,6,7,8,9,10], 'col2' :['Email','Email','Email','Email','Email','Email','Other','Other','Other','Other']}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)
   degreeCentraloty   col2
0                 1  Email
1                 2  Email
2                 3  Email
3                 4  Email
4                 5  Email
5                 6  Email
6                 7  Other
7                 8  Other
8                 9  Other
9                10  Other

I have then taken a subset of the df dataframe by selecting only the rows for which col2 = "Email":

data = df.loc[df['col2'] == 'Email']

   degreeCentraloty   col2
0                 1  Email
1                 2  Email
2                 3  Email
3                 4  Email
4                 5  Email
5                 6  Email

Then I have binned the field called degreeCentraloty like this:

data['dg_binned'] = pd.qcut(data['degreeCentraloty'], q = 2)
print(data)

   degreeCentraloty   col2     dg_binned
0                 1  Email  (0.999, 3.5]
1                 2  Email  (0.999, 3.5]
2                 3  Email  (0.999, 3.5]
3                 4  Email    (3.5, 6.0]
4                 5  Email    (3.5, 6.0]
5                 6  Email    (3.5, 6.0]

I need to convert the field dg_binned inot a list that I can use as binner. So from this:

   dg_binned
(0.999, 3.5]
(0.999, 3.5]
(0.999, 3.5]
  (3.5, 6.0]
  (3.5, 6.0]
  (3.5, 6.0]

I need to get this:

[3.5,6]

Does anybody know how do it in pandas?


Solution

  • IIUC use:

    i = pd.IntervalIndex(data['dg_binned'])
    print(i)
    IntervalIndex([(0.999, 3.5], (0.999, 3.5], (0.999, 3.5],(3.5, 6.0], (3.5, 6.0], (3.5, 6.0]],
                  closed='right',
                  name='dg_binned',
                  dtype='interval[float64]')
    
    L = list(map(list, zip(i.left, i.right)))
    print(L)
    [[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]
    

    Or:

    L = [[i.left, i.right] for i in data['dg_binned']]
    print(L)
    [[0.999, 3.5], [0.999, 3.5], [0.999, 3.5], [3.5, 6.0], [3.5, 6.0], [3.5, 6.0]]