Search code examples
pythonpandasdate-rangereindex

Re-index dataframe by new range of dates


I have a data frame containing a number of observations:

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-20   yellow     3

I would like to re-index the data frame and standardise the dates.

date         colour     orders
2014-10-20   red        7
2014-10-21   red        10
2014-10-22   red        NaN
2014-10-20   yellow     3
2014-10-21   yellow     NaN
2014-10-22   yellow     NaN

I though to order the data frame by colour and date, and then try to re-index it.

index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts

But it returns a new data frame with the right index but all NaN values.

date         colour     orders
2014-10-20   NaN        NaN
2014-10-21   NaN        NaN
2014-10-22   NaN        NaN

Solution

  • Starting from your exampe dataframe:

    In [51]: df
    Out[51]:
            date  colour  orders
    0 2014-10-20     red       7
    1 2014-10-21     red      10
    2 2014-10-20  yellow       3
    

    If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):

    In [52]: df = df.set_index(['date', 'colour'])
    
    In [53]: df
    Out[53]:
                       orders
    date       colour
    2014-10-20 red          7
    2014-10-21 red         10
    2014-10-20 yellow       3
    

    You can now reindex this dataframe, after you constructed to desired index:

    In [54]: index = pd.date_range('20/10/2014', '22/10/2014')
    
    In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])
    
    In [56]: df.reindex(multi_index)
    Out[56]:
                       orders
    2014-10-20 red          7
               yellow       3
    2014-10-21 red         10
               yellow     NaN
    2014-10-22 red        NaN
               yellow     NaN
    

    To have the same output as your example output, the index should be sorted in the second level (level=1 as it is 0-based):

    In [60]: df2 = df.reindex(multi_index)
    
    In [64]: df2.sortlevel(level=1)
    Out[64]:
                       orders
    2014-10-20 red          7
    2014-10-21 red         10
    2014-10-22 red        NaN
    2014-10-20 yellow       3
    2014-10-21 yellow     NaN
    2014-10-22 yellow     NaN
    

    A possible way to generate the multi-index automatically would be (with your original frame):

    pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'), 
                                df['colour'].unique()])
    

    Another way would be to use resample for each group of colors:

    In [77]: df = df.set_index('date')
    
    In [78]: df.groupby('colour').resample('D')
    

    This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.