Search code examples
pythonpandasresamplingstockohlc

Merging historical and live stock price data in pandas


First I create a Pandas dataframe containing historical 1min OHLCV data for the day, e.g.:

                    open    high    low close   volume
date                    
2019-10-30 07:55:00 3034.00 3034.25 3033.75 3034.00 209
2019-10-30 07:56:00 3034.00 3034.25 3033.75 3034.00 315
2019-10-30 07:57:00 3034.25 3034.50 3033.75 3034.25 432
2019-10-30 07:58:00 3034.00 3034.25 3033.75 3033.75 329
2019-10-30 07:59:00 3034.00 3034.25 3033.75 3034.00 231

The next moment I subscribe to a live tick feed using a listener class and resample this into a continuously updated 1min OHLCV data dataframe, e.g.:

                    open    high    low close   volume
date                    
2019-10-30 07:59:00 3033.75 3034.00 3033.75 3034.00 35
2019-10-30 08:00:00 3033.75 3034.25 3033.25 3033.75 117
2019-10-30 08:01:00 3033.75 3034.00 3033.75 3034.00 78

How do I merge these two, so that each new row of live data (ticks resampled to 1min rows) is appended to the historical data? Another issue is the overlap between the last minute of historical data and the first minute of live data - these need to be combined.


Solution

  • # isolate the new indexes, (present in live_df but not in hist_df)
    new_locs = ~live_df.index.isin(hist_df.index)
    
    # append just the new entries in the live_df
    new_df = hist_df.append(live_df.loc[new_locs])
    

    If your history df grows exceptionally long, this may get slow over time. If you keep your dataframe sorted in ascending time you could simplify the new_locs check to only look at the most recent few rows. with .iloc()