Search code examples
pandasdataframesktime

Dataframe to multiIndex for sktime format


I have a multivariate time series data which is in this format(pd.Dataframe with index on Time),

Existing format

I am trying to use sktime, which requires the data to be in multi index format. On the above if i want to use a rolling window of 3 on above data. It requires it in this format. Here pd.Dataframe has multi-index on (instance,time)

Desired format

I was thinking if it is possible to transform it to new format.


Solution

  • Edit here's a more straightforward and probably faster solution using row indexing

    df = pd.DataFrame({
        'time':range(5),
        'a':[f'a{i}' for i in range(5)],
        'b':[f'b{i}' for i in range(5)],
    })
    
    w = 3
    w_starts = range(0,len(df)-(w-1)) #start positions of each window
    
    #iterate through the overlapping windows to create 'instance' col and concat
    roll_df = pd.concat(
        df[s:s+w].assign(instance=i) for (i,s) in enumerate(w_starts)
    ).set_index(['instance','time'])
    
    print(roll_df)
    

    Output

                    a   b
    instance time        
    0        0     a0  b0
             1     a1  b1
             2     a2  b2
    1        1     a1  b1
             2     a2  b2
             3     a3  b3
    2        2     a2  b2
             3     a3  b3
             4     a4  b4