Search code examples
pythonpandasdataframeaggregation

panda dataframe aggregate by ID and date


I'm trying to aggregate a dataframe by both ID and date. Suppose I had a dataframe:

  Publish date  ID  Price
0   2000-01-02   0     10
1   2000-01-03   0     20
2   2000-02-17   0     30
3   2000-01-04   1     40

I would like to aggregate the value by ID and date (frequency = 1W) and get a dataframe like:

  Publish date  ID  Price
0   2000-01-02   0     30
1   2000-02-17   0     30
2   2000-01-04   1     40

I understand it can be achieved by iterating the ID and using grouper to aggregate the price. Is there any more efficient way without iterating the IDs? Many thanks.


Solution

  • Use Grouper with aggregate sum, but not sure about frequency of Grouper (because all looks different like in question):

    df['Publish date'] = pd.to_datetime(df['Publish date'])
    
    df = (df.groupby([pd.Grouper(freq='W', key='Publish date'),'ID'], sort=False)['Price']
            .sum()
            .reset_index())
    print (df)
      Publish date  ID  Price
    0   2000-01-02   0     10
    1   2000-01-09   0     20
    2   2000-02-20   0     30
    3   2000-01-09   1     40
    

    df['Publish date'] = pd.to_datetime(df['Publish date'])
    
    df = (df.groupby([pd.Grouper(freq='W-Mon', key='Publish date'),'ID'], sort=False)['Price']
            .sum()
            .reset_index())
    print (df)
      Publish date  ID  Price
    0   2000-01-03   0     30
    1   2000-02-21   0     30
    2   2000-01-10   1     40
    

    Or:

    df['Publish date'] = pd.to_datetime(df['Publish date'])
    
    df = (df.groupby([pd.Grouper(freq='7D', key='Publish date'),'ID'], sort=False)['Price']
            .sum()
            .reset_index())
    print (df)
      Publish date  ID  Price
    0   2000-01-02   0     30
    1   2000-02-13   0     30
    2   2000-01-02   1     40