Putting this out there to see if there is a feasible solution for this.
Suppose I have a volumes for every month that say looked liked :
But if the start date for these volumes to be assigned is somewhere in a partial month, lets say in this example its 7/15/2023.
Is there a way to resample the data that sums of the correct volumes till the end of the month ? The answer should look like this.
I tried the simplae resampling , it just shifts the dates to the end of the month, but not in the way I wanted it to be. Any ideas?
import pandas as pd
df = pd.DataFrame({"DATE":["07/15/2023","08/15/2023","09/15/2023"], "VOL":[100, 150, 100]})
df['DATE']=df['DATE'].apply(pd.to_datetime)
df =df.resample('M', on='DATE').sum().reset_index()
print(df)
IIUC, you can do:
import calendar
df = pd.DataFrame(
{"DATE": ["07/15/2023", "08/15/2023", "09/15/2023"], "VOL": [100, 150, 100]}
)
def get_days_in_month(year, month):
return calendar.monthrange(year, month)[1]
df["DATE"] = df["DATE"].apply(pd.to_datetime)
days_to_month_end = ((df["DATE"] + pd.offsets.MonthEnd()) - df["DATE"]).dt.days
days_in_month = df["DATE"].apply(lambda x: get_days_in_month(x.year, x.month))
tmp = df["VOL"] * days_to_month_end / days_in_month
df["RESULT"] = (df["VOL"].shift().fillna(0) - tmp.shift().fillna(0)) + tmp
print(df)
Prints:
DATE VOL RESULT
0 2023-07-15 100 51.612903
1 2023-08-15 150 125.806452
2 2023-09-15 100 122.580645