i want to put the data into bins using pd.cat which has a parameter labels for labeling the bin and that is not working
there is no error it is executing the code but without the labels
Input
pd.cut(datatot['YearBuilt'].values,bins=pd.IntervalIndex.from_breaks([1872,1900,1928,1956,1984,2011],closed='left'),labels=["vvo","vo","o","n","r"])
OUTPUT:
[[1984, 2011), [1956, 1984), [1984, 2011), [1900, 1928), [1984, 2011), ..., [1956, 1984), [1956, 1984), [1956, 1984), [1984, 2011), [1984, 2011)]
Length: 2919
Categories (5, interval[int64]): [[1872, 1900) < [1900, 1928) < [1928, 1956) < [1956, 1984) < [1984, 2011)]
the data should be labeled 'vvo' or 'vo' according to the labels not the intervals
You can omit IntervalIndex
and add parameter right=False
for left closed intervals to cut
:
datatot = pd.DataFrame({'YearBuilt':range(1880, 2020, 10)})
datatot['orig'] = pd.cut(datatot['YearBuilt'].values,bins=pd.IntervalIndex.from_breaks([1872,1900,1928,1956,1984,2011],closed='left'),labels=["vvo","vo","o","n","r"])
#not specifiend labels for compare
datatot['new1'] = pd.cut(datatot['YearBuilt'],bins=[1872,1900,1928,1956,1984,2011], right=False)
#specified labels
datatot['new2'] = pd.cut(datatot['YearBuilt'],bins=[1872,1900,1928,1956,1984,2011], right=False,labels=["vvo","vo","o","n","r"])
print (datatot)
YearBuilt orig new1 new2
0 1880 [1872, 1900) [1872, 1900) vvo
1 1890 [1872, 1900) [1872, 1900) vvo
2 1900 [1900, 1928) [1900, 1928) vo
3 1910 [1900, 1928) [1900, 1928) vo
4 1920 [1900, 1928) [1900, 1928) vo
5 1930 [1928, 1956) [1928, 1956) o
6 1940 [1928, 1956) [1928, 1956) o
7 1950 [1928, 1956) [1928, 1956) o
8 1960 [1956, 1984) [1956, 1984) n
9 1970 [1956, 1984) [1956, 1984) n
10 1980 [1956, 1984) [1956, 1984) n
11 1990 [1984, 2011) [1984, 2011) r
12 2000 [1984, 2011) [1984, 2011) r
13 2010 [1984, 2011) [1984, 2011) r