Search code examples
pandasdatepandas-groupbysklearn-pandas

How to add new 5-minute interval


I want to add common time intervals to all dates that are present only on some dates. This is the sample of my data. There are many more time intervals that are not present in this sample data.

Data:

Row   Date     Hour Minute  Open    Close

0   2006-12-11  10  0       736.0   778.0
1   2006-12-11  10  5       775.0   775.0
2   2006-12-11  10  10      778.8   780.0
3   2006-12-11  10  30      780.0   780.0
4   2006-12-11  10  45      780.0   780.0
5   2006-12-11  10  50      781.0   799.0
6   2006-12-12   9  0       736.0   778.0
7   2006-12-12   9  5       775.0   775.0
8   2006-12-12   9  10      778.8   780.0
9   2006-12-12  10  0       780.0   780.0
10  2006-12-12  10  5       780.0   780.0
11  2006-12-12  10  10      781.0   799.0
12  2006-12-12  10  15      780.0   780.0
13  2006-12-12  10  45      780.0   780.0
14  2006-12-12  10  50      781.0   799.0


Expected Output:
Row   Date     Hour Minute  Open    Close

0   2006-12-11   9  0       null    null
1   2006-12-11   9  5       null    null
2   2006-12-11   9  10      null    null
3   2006-12-11  10  0       736.0   778.0
4   2006-12-11  10  5       775.0   775.0
5   2006-12-11  10  10      778.8   780.0
6   2006-12-11  10  15      null    null
7   2006-12-11  10  30      780.0   780.0
8   2006-12-11  10  45      780.0   780.0
9   2006-12-11  10  50      781.0   799.0
10  2006-12-12   9  0       736.0   778.0
11  2006-12-12   9  5       775.0   775.0
12  2006-12-12   9  10      778.8   780.0
12  2006-12-12  10  0       780.0   780.0
14  2006-12-12  10  5       780.0   780.0
15  2006-12-12  10  10      781.0   799.0
16  2006-12-12  10  15      780.0   780.0
17  2006-12-11  10  30      null    null
18  2006-12-12  10  45      780.0   780.0
19  2006-12-12  10  50      781.0   799.0

Solution

  • You can use DataFrame.unstack with DataFrame.stack for add missing combinations:

    df1 = (df.set_index(['Date','Hour','Minute'])
             .unstack([1,2])
             .stack([1,2],dropna=False)
             .reset_index())
    

    Or DataFrame.reindex with MultiIndex.from_product:

    df1 = df.set_index(['Date','Hour','Minute'])
    mux = pd.MultiIndex.from_product(df1.index.levels)
    df1 = df1.reindex(mux).reset_index()
    

    print (df1)
              Date  Hour  Minute   Open  Close
    0   2006-12-11     9       0    NaN    NaN
    1   2006-12-11     9       5    NaN    NaN
    2   2006-12-11     9      10    NaN    NaN
    3   2006-12-11     9      15    NaN    NaN
    4   2006-12-11     9      30    NaN    NaN
    5   2006-12-11     9      45    NaN    NaN
    6   2006-12-11     9      50    NaN    NaN
    7   2006-12-11    10       0  736.0  778.0
    8   2006-12-11    10       5  775.0  775.0
    9   2006-12-11    10      10  778.8  780.0
    10  2006-12-11    10      15    NaN    NaN
    11  2006-12-11    10      30  780.0  780.0
    12  2006-12-11    10      45  780.0  780.0
    13  2006-12-11    10      50  781.0  799.0
    14  2006-12-12     9       0  736.0  778.0
    15  2006-12-12     9       5  775.0  775.0
    16  2006-12-12     9      10  778.8  780.0
    17  2006-12-12     9      15    NaN    NaN
    18  2006-12-12     9      30    NaN    NaN
    19  2006-12-12     9      45    NaN    NaN
    20  2006-12-12     9      50    NaN    NaN
    21  2006-12-12    10       0  780.0  780.0
    22  2006-12-12    10       5  780.0  780.0
    23  2006-12-12    10      10  781.0  799.0
    24  2006-12-12    10      15  780.0  780.0
    25  2006-12-12    10      30    NaN    NaN
    26  2006-12-12    10      45  780.0  780.0
    27  2006-12-12    10      50  781.0  799.0