I have a dataset that consists of multiple stores/retailers:
RETAILER VOLUME DISP PRICE
store1 12 15 10
store1 10 8 17
store1 12 13 12
...
store2 22 22 30
store2 17 14 22
store2 23 18 18
...
store3 11 13 10
store3 12 13 13
store3 14 12 11
Unfortunately this data set does not contain dates so I need to generate date sequences for each store within this table. Frequency isn't really that important - days or months will be ok.
To make it easier let's assume that it starts on 2000-01-01 and number of dates for each RETAILER depends on number of observations for this RETAILER.
I'm able to handle this in R:
df <- df %>%
arrange(RETAILER) %>%
group_by(RETAILER) %>%
mutate(dates = seq(as.Date("2000-01-01"), by = "month", length.out = n()))
However, I'm unable to replicate this in Python. I've been trying with pd.date_range(), but can't find the solution.
Can anyone please suggest some way to perform this task in Python?
The desired output:
RETAILER VOLUME DISP PRICE DATE
store1 12 15 10 2000-01-01
store1 10 8 17 2000-02-01
store1 12 13 12 2000-03-01
...
store2 22 22 30 2000-01-01
store2 17 14 22 2000-02-01
store2 23 18 18 2000-03-01
...
store3 11 13 10 2000-01-01
store3 12 13 13 2000-02-01
store3 14 12 11 2000-03-01
Let us group
the dataframe on RETAILER
and use cumcount
to create sequential counter per RETAILER
then map
this counter to MonthBegin
offset and add a Timestamp('2000-01-01')
:
c = df.groupby('RETAILER').cumcount()
df['DATE'] = pd.Timestamp('2000-01-01') + c.map(pd.offsets.MonthBegin)
RETAILER VOLUME DISP PRICE DATE
0 store1 12 15 10 2000-01-01
1 store1 10 8 17 2000-02-01
2 store1 12 13 12 2000-03-01
3 store2 22 22 30 2000-01-01
4 store2 17 14 22 2000-02-01
5 store2 23 18 18 2000-03-01
6 store3 11 13 10 2000-01-01
7 store3 12 13 13 2000-02-01
8 store3 14 12 11 2000-03-01