python time-series bigdata data-management

Testing for unstamped minutes in time series data

I have a df of minutely prices and want to establish if there are minutes missing (across a 5 year period). The price is only stamped when there is a transaction so there are some missing minutes.

There are 4 entities in a different column and I would like to know the entity that is missing the minute as well as when it was.

My first inclination is to resample and sum NaNs. What is the best way of doing this?

Solution

Until there is a better answer here is how I have dealt with this. Merge with the nearest minute using pandas

Write the answer from this question out with the addition of printing all NaN values.

df_time = pd.DataFrame({'date':pd.date_range(start='yyyy/mm/dd',end='yyyy/mm/dd', freq='1T')}) df_time.info() this with simple division will confirm you have the right data size

df_combined = pd.merge(df_time, df_price, on='date') print(df_combined.isna())

I then wanted to have the same price as the previous minute as no transactions of significant difference have occured, I did this through df_combined.ffill()