I am trying to divide line items with a start and end date into multiple rows based on months. Values should be calculated based on number of days in the specific months.
For instance, data of 1 line item:
id | StartDate | EndDate | Annual |
---|---|---|---|
abc | 12/12/2018 | 01/12/2019 | 120,450 |
expected output:
id | Month | Year | Monthly volume |
---|---|---|---|
abc | 12 | 2018 | 6,600 |
abc | 1 | 2019 | 10,230 |
abc | 2 | 2019 | 9,240 |
abc | 3 | 2019 | 10,230 |
abc | 4 | 2019 | 9,900 |
abc | 5 | 2019 | 10,230 |
abc | 6 | 2019 | 9,900 |
abc | 7 | 2019 | 10,230 |
abc | 8 | 2019 | 10,230 |
abc | 9 | 2019 | 9,900 |
abc | 10 | 2019 | 10,230 |
abc | 11 | 2019 | 9,900 |
Few things for next time you ask.
import pandas as pd
df = pd.DataFrame(
data = [['abc','12/12/2018','12/01/2019',120450]],
columns = ['id', 'startDate', 'EndDate', 'Annual']
)
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# pd.bdate_range(start="2020/12/16", end="2020/12/26", freq="C", weekmask="Sat Sun")
# %%
df_start_end = df.melt(id_vars=['id', 'Annual'],value_name='date')
# credit to u/gen
# https://stackoverflow.com/questions/42151886/expanding-pandas-data-frame-with-date-range-in-columns
df = (
df_start_end.groupby('id')
.apply(lambda x: x.set_index('date')
.resample('M').pad())
.drop(columns=['id','variable'])
.reset_index()
)
print(df)