Here is my example data:
team | sales | month |
---|---|---|
a | 100 | 1/1/2023 |
a | 200 | 2/1/2023 |
b | 600 | 1/1/2023 |
b | 300 | 2/1/2023 |
load in pandas like so:
mydata = pd.DataFrame([
['team','sales','month'],
['a', 100, '1/1/2023'],
['a', 200, '2/1/2023'],
['b', 600, '1/1/2023'],
['b', 300, '2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
mydata['month'] = pd.to_datetime(mydata['month'])
My desired outcome for team "a" is this data aggregated by each week as starting on Monday, like this:
team | sales | Monday Week |
---|---|---|
a | 22.58 | 1/2/2023 |
a | 22.58 | 1/9/2023 |
a | 22.58 | 1/16/2023 |
a | 22.58 | 1/23/2023 |
a | 42.17 | 1/30/2023 |
a | 50 | 2/6/2023 |
a | 50 | 2/13/2023 |
a | 50 | 2/20/2023 |
a | 14.29 | 2/27/2023 |
So the logic on the calculated sales per week is:
$100 of sales in January, so avg sales per day is 100/31 = 3.23 per day, * 7 days in a weeks = $22.58 for each week in January.
February is $200 over 28 days, so ($200/28)*7 = $50 a week in Feb.
The calculation on the week starting 1/30/2023 is a little more complicated. I need to carry the January rate the first 2 days of 1/30 and 1/31, then start summing the Feb rate for the following 5 days in Feb (until 2/5/2023). So it would be 5*(200/28)+2*(100/31) = 42.17
Is there a way to do this in Pandas? I believe the logic that may work is taking each monthly total, decomposing that into daily data with an average rate, then using pandas to aggregate back up to weekly data starting on Monday for each month, but I'm lost trying to chain together the date functions.
I think you have miscalculation for team A for the week of 1/30/2023
. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46
.
Here's one way to do that:
def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
tmp = (
# Put month to the index and convert it to monthly period
group.set_index("month")[["sales"]]
.to_period("M")
# Calculate the average daily sales
.assign(sales=lambda x: x["sales"] / x.index.days_in_month)
# Up-sample the dataframe to daily
.resample("1D")
.ffill()
# Sum by week
.groupby(pd.Grouper(freq="W"))
.sum()
)
# Clean up the index
tmp.index = tmp.index.to_timestamp().rename("week_starting")
return tmp
df.groupby("team").apply(get_weekly_sales)