Search code examples
pandaslabelbinning

the bins are not being labeled even if i have provided the list of labels while using pd.cut


i want to put the data into bins using pd.cat which has a parameter labels for labeling the bin and that is not working

there is no error it is executing the code but without the labels

Input

pd.cut(datatot['YearBuilt'].values,bins=pd.IntervalIndex.from_breaks([1872,1900,1928,1956,1984,2011],closed='left'),labels=["vvo","vo","o","n","r"]) 

OUTPUT:

 [[1984, 2011), [1956, 1984), [1984, 2011), [1900, 1928), [1984, 2011), ..., [1956, 1984), [1956, 1984), [1956, 1984), [1984, 2011), [1984, 2011)]
 Length: 2919
 Categories (5, interval[int64]): [[1872, 1900) < [1900, 1928) < [1928, 1956) < [1956, 1984) < [1984, 2011)]

the data should be labeled 'vvo' or 'vo' according to the labels not the intervals


Solution

  • You can omit IntervalIndex and add parameter right=False for left closed intervals to cut:

    datatot = pd.DataFrame({'YearBuilt':range(1880, 2020, 10)})
    
    datatot['orig'] = pd.cut(datatot['YearBuilt'].values,bins=pd.IntervalIndex.from_breaks([1872,1900,1928,1956,1984,2011],closed='left'),labels=["vvo","vo","o","n","r"])
    #not specifiend labels for compare
    datatot['new1'] = pd.cut(datatot['YearBuilt'],bins=[1872,1900,1928,1956,1984,2011], right=False) 
    #specified labels
    datatot['new2'] = pd.cut(datatot['YearBuilt'],bins=[1872,1900,1928,1956,1984,2011], right=False,labels=["vvo","vo","o","n","r"]) 
    print (datatot)
        YearBuilt          orig          new1 new2
    0        1880  [1872, 1900)  [1872, 1900)  vvo
    1        1890  [1872, 1900)  [1872, 1900)  vvo
    2        1900  [1900, 1928)  [1900, 1928)   vo
    3        1910  [1900, 1928)  [1900, 1928)   vo
    4        1920  [1900, 1928)  [1900, 1928)   vo
    5        1930  [1928, 1956)  [1928, 1956)    o
    6        1940  [1928, 1956)  [1928, 1956)    o
    7        1950  [1928, 1956)  [1928, 1956)    o
    8        1960  [1956, 1984)  [1956, 1984)    n
    9        1970  [1956, 1984)  [1956, 1984)    n
    10       1980  [1956, 1984)  [1956, 1984)    n
    11       1990  [1984, 2011)  [1984, 2011)    r
    12       2000  [1984, 2011)  [1984, 2011)    r
    13       2010  [1984, 2011)  [1984, 2011)    r