Search code examples
pythonpandastime-seriesmissing-datapycaret

time series stock data having gaps in dataframe to be modeled in Pycaret


I have a a csv file which I have imported as follows:

ps0pyc=pd.read_csv(r'/Users/swapnilgupta/Desktop/fend/p0.csv')
ps0pyc['Date'] = pd.to_datetime(ps0pyc['Date'], dayfirst= True)
ps0pyc

    Date    PORTVAL
0   2013-01-03  17.133585
1   2013-01-04  17.130434
2   2013-01-07  17.396581
3   2013-01-08  17.308323
4   2013-01-09  17.475933
... ... ...
2262    2021-12-28  205.214555
2263    2021-12-29  204.076193
2264    2021-12-30  203.615507
2265    2021-12-31  201.143990
2266    2022-01-03  204.867302
2267 rows × 2 columns

It is a dataframe time series , i.e stock data which has approx 252 trading days per year ranging from 2013 to 2022 I am trying to apply time series module of PyCaret over it only problem which I encounter is that PyCaret doesn't support modeling for daily data with missing values , and my dataset has stock data per year of 252 days and not continuous 366/365 days

What is alternate solution to this and how should i use such data with gaps in Pycaret time series module ?


Solution

  • Set index to your dataframe

    ps0pyc.set_index('Date',inplace=True)
    

    Create a new continuous index for the period

    new_idx = pd.date_range('01-01-2013', '01-01-2023')
    

    Reindex your dataframe

    reindexing your dataframe to newly created index

    ps0pyc = ps0pyc.reindex(new_idx , fill_value=0)
    

    You can also forward fill or back fill with

    ps0pyc = ps0pyc['PORTVAL'].ffill(inplace=True)
    #or
    ps0pyc = ps0pyc['PORTVAL'].bfill(inplace=True)