First I create a Pandas dataframe containing historical 1min OHLCV data for the day, e.g.:
open high low close volume
date
2019-10-30 07:55:00 3034.00 3034.25 3033.75 3034.00 209
2019-10-30 07:56:00 3034.00 3034.25 3033.75 3034.00 315
2019-10-30 07:57:00 3034.25 3034.50 3033.75 3034.25 432
2019-10-30 07:58:00 3034.00 3034.25 3033.75 3033.75 329
2019-10-30 07:59:00 3034.00 3034.25 3033.75 3034.00 231
The next moment I subscribe to a live tick feed using a listener class and resample this into a continuously updated 1min OHLCV data dataframe, e.g.:
open high low close volume
date
2019-10-30 07:59:00 3033.75 3034.00 3033.75 3034.00 35
2019-10-30 08:00:00 3033.75 3034.25 3033.25 3033.75 117
2019-10-30 08:01:00 3033.75 3034.00 3033.75 3034.00 78
How do I merge these two, so that each new row of live data (ticks resampled to 1min rows) is appended to the historical data? Another issue is the overlap between the last minute of historical data and the first minute of live data - these need to be combined.
# isolate the new indexes, (present in live_df but not in hist_df)
new_locs = ~live_df.index.isin(hist_df.index)
# append just the new entries in the live_df
new_df = hist_df.append(live_df.loc[new_locs])
If your history df grows exceptionally long, this may get slow over time. If you keep your dataframe sorted in ascending time you could simplify the new_locs
check to only look at the most recent few rows. with .iloc()