Here is my problem:
This is a sample of my DataFrame (it actually goes from 2002 to 2012)
df = pd.DataFrame(
{'Date':["2002-07-31","2002-07-31","2002-07-31","2002-07-31","2002-07-31","2002-08-31","2002-08-31","2002-08-31","2002-08-31","2002-08-31",'2002-09-30','2002-09-30','2002-09-30','2002-09-30','2002-09-30'],
'Name': ["Paul", "John", "Silvia", "Mike", "Cindy","Paul", "David", "Harry", "Mike", "Britney","Francis", "Michael", "Charlie", "Joe", "Hilary"]})
Which gives this
Date Name
0 2002-07-31 Paul
1 2002-07-31 John
2 2002-07-31 Silvia
3 2002-07-31 Mike
4 2002-07-31 Cindy
5 2002-08-31 Paul
6 2002-08-31 David
7 2002-08-31 Harry
8 2002-08-31 Mike
9 2002-08-31 Britney
10 2002-09-30 Francis
11 2002-09-30 Michael
12 2002-09-30 Charlie
13 2002-09-30 Joe
14 2002-09-30 Hilary
and I would like to resample the serie from Monthly to Daily DataFrame by keeping all the names fix from 2002-07-31 to 2002-08-30 and from 2002-08-31 to 2002-09-30 (the change only occurs at the end of each month so it looks like resampling with the ffill() method).
The result I am looking for is something like that :
Date Name
2002-07-31 Paul
2002-07-31 John
2002-07-31 Silvia
2002-07-31 Mike
2002-07-31 Cindy
2002-08-01 Paul
2002-08-01 John
2002-08-01 Silvia
2002-08-01 Mike
2002-08-01 Cindy
2002-08-02 Paul
2002-08-02 John
2002-08-02 Silvia
2002-08-02 Mike
2002-08-02 Cindy
2002-08-03 Paul
2002-08-03 John
2002-08-03 Silvia
2002-08-03 Mike
2002-08-03 Cindy
.....
2002-08-31 Paul
2002-08-31 David
2002-08-31 Harry
2002-08-31 Mike
2002-08-31 Britney
2002-09-01 Paul
2002-09-01 David
2002-09-01 Harry
2002-09-01 Mike
2002-09-01 Britney
....
2002-09-30 Francis
2002-09-30 Michael
2002-09-30 Charlie
2002-09-30 Joe
2002-09-30 Hilary
As you can see, the names only change at the end of each month. The most difficult step for me is that I have a selection of 5 names, and I don't really know how to resample to a daily Dataframe and still having 5 names for each day.
I already looked at this link
Resampling Error : cannot reindex a non-unique index with a method or limit
But It is not really the same issue and I still don't find any solution to manage mine. If you have any ideas, you are welcome!
First, ensure your Date
column is a datetime
object:
df['Date'] = df.Date.astype('datetime64')
Then, group by the Date
column, aggregate the names to list
, resample
by day and fill, finally perform an explode
to expand the list
s of names:
df.groupby('Date').agg(list).resample('D').ffill().explode('Name').reset_index()
# Result:
Date Name
0 2002-07-31 Paul
1 2002-07-31 John
2 2002-07-31 Silvia
3 2002-07-31 Mike
4 2002-07-31 Cindy
.. ... ...
305 2002-09-30 Francis
306 2002-09-30 Michael
307 2002-09-30 Charlie
308 2002-09-30 Joe
309 2002-09-30 Hilary
[310 rows x 2 columns]