I work in the context of the supply chain. I'm trying to set deliveries. The date of receipt and the quantity delivered of the deliveries can be stochastic (advance, delay in relation to the fixed date and breakage or error in entering the order). To model this stochasticity I simulate several scenarii for example in the first scenario, the order arrives 40 units of item_1 and 45 of item_2 at time 1 (today = 0), in the second scenario 45 units arrive of item_1 and 39 of item_2 at time 2, etc. There can be like hundred scenarii.
All is represented with a dataframe like this one :
item_id reception_date ordered_qty
0 item_1 [1, 2, 1] [40, 45, 39]
1 item_2 [1, 2, 1] [45, 39, 42]
i° coord == i° scenario
I have a fixed horizon (for exemple 3 days). I want to give a more explicite view of the deliveries through horizon. I want to obtain a new dataframe like this one :
item_id period_date ordered_qty
0 item_1 0 [0, 0, 0]
1 item_1 1 [40, 0, 39]
2 item_1 2 [0, 45, 0]
3 item_2 0 [0, 0, 0]
4 item_2 1 [45, 0, 42]
5 item_2 2 [0, 39, 0]
In this dataframe, i° scenario is describe by the i° column of the matrix formed by all the scenarii ; we can see the quantity and the reception day looking at the corresponding day in the column period_date.
I've made several tries (merges, concate, for loop, etc.), the best I've obtain is bellow :
import pandas as pd
import numpy as np
trace_size = 3 #number of scenarii
df_initial = pd.DataFrame({
'item_id': ['item_1', 'item_2'],
'reception_date': [[1, 2, 1], [1, 2, 1]],
'ordered_qty': [[40, 45, 39], [45, 39, 42]]
})
df = pd.DataFrame(columns=['item_id', 'reception_date', 'ordered_qty'])
for z in range(trace_size):
df1 = df_initial.copy()
df1['reception_date']=df_initial['reception_date'].apply(lambda x: x[z])
df1['ordered_qty']=df_initial['ordered_qty'].apply(lambda x: x[z])
df = pd.concat([df, df1])
df = df.groupby(['item_id', 'reception_date'], as_index=False).agg({'ordered_qty': list})
dyn = pd.DataFrame({'item_id': np.repeat(df_initial['item_id'].values, 3),
'period_date': np.tile(np.arange(0, 3), 2)})
df_final = dyn.merge(df.rename(columns={'reception_date': 'period_date'}), how='outer').fillna(0)
with this result :
item_id period_date ordered_qty
0 item_1 0 0
1 item_1 1 [40, 39]
2 item_1 2 [45]
3 item_2 0 0
4 item_2 1 [45, 42]
5 item_2 2 [39]
it's better without for loop to make it more efficient but I don't think it's possible.
IIUC, you can reshape using:
trace_size = 3
out = (df_initial
.explode(['reception_date', 'ordered_qty'])
.assign(n=lambda d: d.groupby(level=0).cumcount())
# add missing combinations of
# item_id/reception_date/n
.pipe(lambda d: d
.set_index(['item_id', 'reception_date', 'n'])
.reindex(pd.MultiIndex.from_product([
d['item_id'].unique(),
range(trace_size),
range(trace_size)
], names=['item_id', 'reception_date', 'n']), fill_value=0)
.reset_index()
)
# aggregate as lists
.groupby(['item_id', 'reception_date'], as_index=False)
['ordered_qty'].agg(list)
)
Output:
item_id reception_date ordered_qty
0 item_1 0 [0, 0, 0]
1 item_1 1 [40, 0, 39]
2 item_1 2 [0, 45, 0]
3 item_2 0 [0, 0, 0]
4 item_2 1 [45, 0, 42]
5 item_2 2 [0, 39, 0]