Search code examples
pythonpandasdataframegroup-by

Pandas dataframe count the values within a range and then groupby


I have a dataframe like this :

df =

    date&time   measur_temp cal_temp  diff
0   2022-01-28  27      20            7
1   2022-01-29  27      23            4
2   2022-01-30  30      33            3
3   2022-01-31  33      32            1

I would like to create new table (dataframe is also fine) with ranges of diff value, ranges of measur_temp and count them if they fall into those bins : For ex : In first row of above df, the diff is 7 and lies between range 7 and 10 and the temperature lies between range 21 and 30. Count this as one incident.

The final output should look like this, can anyone help me please ?.

   measur_Temp_range    diff<5  5=<diff<10  10=<diff<15  15=<diff<=20
0   10-20   0.0 0.0
1   21-30   1.1 0.0 
2   31-40   2.0 0.0 
3   41-50   0.0 0.0 

Solution

  • Use crosstab with binning columns by cut:

    df = pd.crosstab(pd.cut(df['measur_temp'],bins=[10,20,30,40,50]),
                      pd.cut(df['diff'],bins=[-np.inf,5,10,15,20], right=False),
                      dropna=False)
    print (df)
    diff         [-inf, 5.0)  [5.0, 10.0)  [10.0, 15.0)  [15.0, 20.0)
    measur_temp                                                      
    (10, 20]               0            1             1             0
    (20, 30]               1            0             0             0
    (30, 40]               1            0             0             0
    (40, 50]               0            0             0             0
    

    If need also count dynamic labels of bins use:

    b1 = range(10,60,10)
    lab1=[f'{a+1}-{b}' if i!=0 else f'{a}-{b}' for i,(a,b) in enumerate(zip(b1, b1[1:]))]
    
    b2 = list(range(0,25,5))
    #first value less like 5
    b2[0] = -np.inf
    
    lab2=[f'{a}=<diff<{b}' if i!=0 else f'diff<{b}' for i,(a,b) in enumerate(zip(b2, b2[1:]))]
    

    df = pd.crosstab(pd.cut(df['measur_temp'],bins=b1, labels=lab1),
                      pd.cut(df['diff'],bins=b2, labels=lab2, right=False),
                      dropna=False)
    
    print (df)
    
    diff         diff<5  5=<diff<10  10=<diff<15  15=<diff<20
    measur_temp                                              
    10-20             0           0            0            0
    21-30             2           1            0            0
    31-40             1           0            0            0
    41-50             0           0            0            0