Search code examples
pandasdataframefrequency

Conversion of Daily pandas dataframe to minute frequency does not work for 2 row dataframe


I am trying to convert a daily frequency dataframe to minute data, and in a previous post it was suggested to use the ffil method below but it does not seem to work with dataframes that consist of only 2 rows (Conversion of Daily pandas dataframe to minute frequency).

So the below dataframe is supposed to be converted.

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns': 0.2},
{ 'ticker':'ge','date': '2016-11-28','returns': 0.2},
{'ticker':'fb', 'date': '2016-11-28','returns': 0.2},
{'ticker':'aapl', 'date': '2016-11-28','returns': 0.2},
{'ticker':'msft','date': '2016-11-28','returns': 0.2},
{'ticker':'amzn','date': '2016-11-28','returns': 0.2},
{'ticker':'jpm','date': '2016-11-29','returns': 0.2},
{'ticker':'ge', 'date': '2016-11-29','returns': 0.2},
{'ticker':'fb','date': '2016-11-29','returns': 0.2},
{'ticker':'aapl','date': '2016-11-29','returns': 0.2},
{'ticker':'msft','date': '2016-11-29','returns': 0.2},
{'ticker':'amzn','date': '2016-11-29','returns': 0.2}
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)  

This works on the entire dataframe:

df_min = df.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

But when I work with a smaller dataframe it returns an empty dataframe for some reason:

df2=df.iloc[0:2,:]

df2_min = df2.unstack().asfreq('Min', method='ffill').between_time('8:30','16:00').stack()

Does anyone have an explanation for this odd behaviour?

edt: I noticed the code only works if the dataframe has at least 7 rows.


Solution

  • If you have only 2 row input DataFrame then after reshape by unstack get one row DataFrame and pandas cannot create continous minute DataFrame, because only one value of DatetimeIndex.

    Possible solution is add next day after reshape, fill it last previous row data, apply solution and in last steps remove last helper row by positions with iloc:

    df2=df.iloc[0:2]
    print (df2)
                       returns
    date       ticker         
    2016-11-28 jpm         0.2
               ge          0.2
    
    df3 = df2.unstack()
    print (df3)
    ticker         jpm   ge
    date                   
    2016-11-28     0.2  0.2
    df3.loc[df3.index.max() + pd.Timedelta(1, unit='d')] = df3.iloc[-1]
    print (df3)
               returns     
    ticker         jpm   ge
    date                   
    2016-11-28     0.2  0.2
    2016-11-29     0.2  0.2 <- helper row
    
    df_min = df3.asfreq('Min', method='ffill')
    print (df_min.tail())
                        returns     
    ticker                  jpm   ge
    date                            
    2016-11-28 23:56:00     0.2  0.2
    2016-11-28 23:57:00     0.2  0.2
    2016-11-28 23:58:00     0.2  0.2
    2016-11-28 23:59:00     0.2  0.2
    2016-11-29 00:00:00     0.2  0.2 <- helper row
    
    df_min = df_min.iloc[:-1].between_time('8:30','16:00').stack()
    #print (df_min)