I have a dataframe like this :
df =
date&time measur_temp cal_temp diff
0 2022-01-28 27 20 7
1 2022-01-29 27 23 4
2 2022-01-30 30 33 3
3 2022-01-31 33 32 1
I would like to create new table (dataframe is also fine) with ranges of diff value, ranges of measur_temp and count them if they fall into those bins : For ex : In first row of above df, the diff is 7 and lies between range 7 and 10 and the temperature lies between range 21 and 30. Count this as one incident.
The final output should look like this, can anyone help me please ?.
measur_Temp_range diff<5 5=<diff<10 10=<diff<15 15=<diff<=20
0 10-20 0.0 0.0
1 21-30 1.1 0.0
2 31-40 2.0 0.0
3 41-50 0.0 0.0
Use crosstab
with binning columns by cut
:
df = pd.crosstab(pd.cut(df['measur_temp'],bins=[10,20,30,40,50]),
pd.cut(df['diff'],bins=[-np.inf,5,10,15,20], right=False),
dropna=False)
print (df)
diff [-inf, 5.0) [5.0, 10.0) [10.0, 15.0) [15.0, 20.0)
measur_temp
(10, 20] 0 1 1 0
(20, 30] 1 0 0 0
(30, 40] 1 0 0 0
(40, 50] 0 0 0 0
If need also count dynamic labels of bins use:
b1 = range(10,60,10)
lab1=[f'{a+1}-{b}' if i!=0 else f'{a}-{b}' for i,(a,b) in enumerate(zip(b1, b1[1:]))]
b2 = list(range(0,25,5))
#first value less like 5
b2[0] = -np.inf
lab2=[f'{a}=<diff<{b}' if i!=0 else f'diff<{b}' for i,(a,b) in enumerate(zip(b2, b2[1:]))]
df = pd.crosstab(pd.cut(df['measur_temp'],bins=b1, labels=lab1),
pd.cut(df['diff'],bins=b2, labels=lab2, right=False),
dropna=False)
print (df)
diff diff<5 5=<diff<10 10=<diff<15 15=<diff<20
measur_temp
10-20 0 0 0 0
21-30 2 1 0 0
31-40 1 0 0 0
41-50 0 0 0 0