I have electrical consumption data between 2016-2019. The data was recorded every 30 minutes for 4 years. There is no data between 13/03/2019 - 31/03/209.
This is the data:
Country Code Electric Consumption (MW)
Date (index)
2016-01-01 84 354642.0
2016-01-02 84 376207.0
2016-01-03 84 381534.0
2016-01-04 84 435561.0
2016-01-05 84 447820.0
... ... ...
2019-12-27 12 374340.0
2019-12-28 12 372761.0
2019-12-29 12 379411.0
2019-12-30 12 416044.0
2019-12-31 12 87519.0
Here is a way to identify the 3 day gaps and fill them. Note that this is working on each unique country code. You can save the final_dfs with a list and use pd.concat() if you need to get them back together again.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Country Code': {'2016-01-01': 84,
'2016-01-02': 84,
'2016-01-03': 84,
'2016-01-04': 84,
'2016-01-05': 84,
'2019-12-27': 12,
'2019-12-28': 12,
'2019-12-29': 12,
'2019-12-30': 12,
'2019-12-31': 12},
'Electric Consumption (MW)': {'2016-01-01': 354642.0,
'2016-01-02': 376207.0,
'2016-01-03': 381534.0,
'2016-01-04': 435561.0,
'2016-01-05': 447820.0,
'2019-12-27': 374340.0,
'2019-12-28': 372761.0,
'2019-12-29': 379411.0,
'2019-12-30': 416044.0,
'2019-12-31': 87519.0}})
# change index value for fake gap
df.index = ['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2019-12-27', '2019-12-28',
'2019-12-29', '2019-12-30', '2020-01-06']
#convert object dates to datetime
df.index = pd.to_datetime(df.index)
for g in df['Country Code'].unique():
# look at each unique country
country_slice = df.loc[df['Country Code'] == g]
# use timedelta to identify 3 day gaps
country_slice['Three Day Gap'] = country_slice.index.to_series().diff() > pd.Timedelta('3d')
# create a new index with previous min and max
idx = pd.date_range(country_slice.index.min(), country_slice.index.max())
s = country_slice['Electric Consumption (MW)']
s.index = pd.DatetimeIndex(s.index)
# this gives us a series with new rows and nans for the missing dates
s = s.reindex(idx, fill_value=np.nan)
# join the old data back to nex index
country_slice_join = country_slice.join(s, how='outer', lsuffix='L')
# now we can interpolate as missing dates are new rows
country_slice_join['interpolate'] = country_slice_join['Electric Consumption (MW)'].interpolate(method='linear', axis=0)
country_slice_join['Country Code'] = country_slice_join['Country Code'].ffill()
# remove temp columns
final_df = country_slice_join[['Country Code', 'interpolate']]
final_df.columns = ['Country Code', 'Electric Consumption (MW)']
Example country_slice_join output before final_df:
Country Code Electric Consumption (MW)L Three Day Gap Electric Consumption (MW) interpolate
2019-12-27 12.0 374340.0 False 374340.0 374340.000000
2019-12-28 12.0 372761.0 False 372761.0 372761.000000
2019-12-29 12.0 379411.0 False 379411.0 379411.000000
2019-12-30 12.0 416044.0 False 416044.0 416044.000000
2019-12-31 12.0 NaN NaN NaN 369111.857143
2020-01-01 12.0 NaN NaN NaN 322179.714286
2020-01-02 12.0 NaN NaN NaN 275247.571429
2020-01-03 12.0 NaN NaN NaN 228315.428571
2020-01-04 12.0 NaN NaN NaN 181383.285714
2020-01-05 12.0 NaN NaN NaN 134451.142857
2020-01-06 12.0 87519.0 True 87519.0 87519.000000
Example final_df output without temporary columns:
Country Code Electric Consumption (MW)
2019-12-27 12.0 374340.000000
2019-12-28 12.0 372761.000000
2019-12-29 12.0 379411.000000
2019-12-30 12.0 416044.000000
2019-12-31 12.0 369111.857143
2020-01-01 12.0 322179.714286
2020-01-02 12.0 275247.571429
2020-01-03 12.0 228315.428571
2020-01-04 12.0 181383.285714
2020-01-05 12.0 134451.142857
2020-01-06 12.0 87519.000000