Search code examples
pythonpandasdataframedata-manipulation

Dataframe add buisness days according to multiple features


I have a quite big data-frame with different products, locations, sold_value and the according date. I want to manipulate the DF in the way that each product for its according location gets dates values according to buisness day freq. As next step i want the sold_value if sold on same date (still according to product and its location) gets added to each other. So that basically each product with its according location has a buisness day freq with sold_value.

I tried multiple things already, at first i tied just to manipulate the DF with groupby.... as next i tried to make new dataframes for each product with its according location add the bday frq and .sum the sold_value according to date after that merging all the DFs but it wont quite work.

as a side fact there are hundreds of locations and products so i cant hard code their values

PS.: Thanks for Help

The DF:

product location date value
1 1 01-01-2022 1
1 1 04-01-2022 1
1 2 01-01-2022 5
2 1 01-01-2022 4
2 1 01-01-2022 3
2 2 01-01-2022 1

My wanted output:

product location date value
1 1 01-01-2022 1
1 1 02-01-2022 0
1 1 03-01-2022 0
1 1 04-01-2022 1
1 2 01-01-2022 5
2 1 01-01-2022 7
2 2 01-01-2022 1

Solution

  • First aggregate sum for column value:

    df['date'] = pd.to_datetime(df['date'], dayfirst=True)
    
    df = df.groupby(['ct','location','date'], as_index=False)['value'].sum()
    print (df)
       ct  location       date  value
    0   1         1 2022-01-01      1
    1   1         1 2022-01-04      1
    2   1         2 2022-01-01      5
    3   2         1 2022-01-01      7
    4   2         2 2022-01-01      1
    

    Then add 0 days per days in lambda function with Series.asfreq:

    f = lambda x: x.asfreq('D', fill_value=0)
    df1 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
    print (df1)
       ct  location       date  value
    0   1         1 2022-01-01      1
    1   1         1 2022-01-02      0
    2   1         1 2022-01-03      0
    3   1         1 2022-01-04      1
    4   1         2 2022-01-01      5
    5   2         1 2022-01-01      7
    6   2         2 2022-01-01      1
    

    If want use business days ouput is different:

    f = lambda x: x.asfreq('B', fill_value=0)
    df2 = df.set_index('date').groupby(['ct','location'])['value'].apply(f).reset_index()
    print (df2)
       ct  location       date  value
    0   1         1 2022-01-03      0
    1   1         1 2022-01-04      1