Search code examples
pythonpandasindexingnanreindex

pandas re-indexing with missing dates


from dateutil.rrule import rrule, MONTHLY

def fread_year_month(strt_dt, end_dt):
        dates = [dt for dt in rrule(MONTHLY, dtstart=strt_dt, until=end_dt)]
        return dates

df = pd.DataFrame({
'value' : [4,2,5,6,7,8,6,5,4,1,2,4],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 12, 1)),
'stock': ['amzn']*12
},columns=[
'value', 'date', 'stock'] )

df2 = pd.DataFrame({
'value' : [1,1,1,1,1],
'date': fread_year_month(dt.datetime(2015, 1, 1),dt.datetime(2015, 5, 1)),
'stock': ['msft']*5
},columns=[
'value', 'date', 'stock'] )

df = df.append(df2)

df.set_index(['stock', 'date'], inplace=True)

I have the above pandas dataframe. As you can see the number of available data for amzn is not the same as msft. In this example the dates are sequential but it need not be the case (the dates can be any date).

If the universe of available dates are the dates for which I have data for AMZN then how can I add those exact dates for every other stock in my data frame with a NaN or NA.

In the example give, I want to insert the missing dates for msft in the index and insert NaN or NA for the value for those date indices.


Solution

  • If you want to work with your tickers as columns, could do an unstack, like this:

    In [71]: df.unstack(level=0)
    Out[71]: 
               value     
    stock       amzn msft
    date                 
    2015-01-01   4.0  1.0
    2015-02-01   2.0  1.0
    2015-03-01   5.0  1.0
    2015-04-01   6.0  1.0
    2015-05-01   7.0  1.0
    2015-06-01   8.0  NaN
    2015-07-01   6.0  NaN
    2015-08-01   5.0  NaN
    2015-09-01   4.0  NaN
    2015-10-01   1.0  NaN
    2015-11-01   2.0  NaN
    2015-12-01   4.0  NaN
    

    To reindex into the same shape, the from_product below creates a new MultiIndex with all the combinations of dates / tickers.

    In [75]: df.reindex(pd.MultiIndex.from_product(df.index.levels))
    Out[75]: 
                     value
    amzn 2015-01-01    4.0
         2015-02-01    2.0
         2015-03-01    5.0
         2015-04-01    6.0
         2015-05-01    7.0
         2015-06-01    8.0
         2015-07-01    6.0
         2015-08-01    5.0
         2015-09-01    4.0
         2015-10-01    1.0
         2015-11-01    2.0
         2015-12-01    4.0
    msft 2015-01-01    1.0
         2015-02-01    1.0
         2015-03-01    1.0
         2015-04-01    1.0
         2015-05-01    1.0
         2015-06-01    NaN
         2015-07-01    NaN
         2015-08-01    NaN
         2015-09-01    NaN
         2015-10-01    NaN
         2015-11-01    NaN
         2015-12-01    NaN