Search code examples
pythonpandas

Is there a way to group data by value in one column to produce a sum of contents in other column in pandas?


I'm sorry if this is a repeat, I can't find anything that gives me an answer...

I have a dataframe containing pixel values and the number of pixels of that value. It looks something like this:

Value    Count 
0.1457   900  
0.1458   1800
0.1459   900
0.2144   1800  
0.4357   2700
0.5764   900
0.7891   1800
0.7892   900
nan      0
nan      0

In this case each instance of nan indicates a single pixel with no data.

I'd like to group these values into 4 classes as follows...:

  • Low: <0.2
  • Mid: 0.2 - 0.6
  • High: >0.6
  • No Data: nan

...and then produce a sum for each class, like so using the above example data:

Class    Count
Lo       3600
Mid      6300
Hi       2700
ND       2

I appreciate there are likely to be several steps to this, but does anyone have any pointers?


Solution

  • import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({"Value":[i*0.01 for i in range(1000)]+[np.nan,np.nan],"Count ":list(range(1000))+[np.nan,1]})
    df['Class'] = ["Lo" if v < 0.2 else "Mid" if v > 0.2 and v < 0.6 else "ND" if np.isnan(v) else "Hi" for v in df.Value]
    
    df.groupby(['Class']).sum()
    

    Output:

    Class Value Count
    Hi 4977.5 497750
    Lo 1.9 190
    Mid 15.6 1560
    ND 0 1