How to detect missing values in a time series for long periods

I have electrical consumption data between 2016-2019. The data was recorded every 30 minutes for 4 years. There is no data between 13/03/2019 - 31/03/209.

I want to ask how I can detect this miss by coding without visualization because I have 12 countries and they may have such missing values in other months and they are not visible. (Detect if there is a miss for more than 3 consecutive days). Thank you for your help!

This is the data:

        Country Code   Electric Consumption (MW)
Date (index)        
2016-01-01    84              354642.0
2016-01-02    84              376207.0
2016-01-03    84              381534.0
2016-01-04    84              435561.0
2016-01-05    84              447820.0

... ... ...
2019-12-27    12              374340.0
2019-12-28    12              372761.0
2019-12-29    12              379411.0
2019-12-30    12              416044.0
2019-12-31    12              87519.0

Solution

Here is a way to identify the 3 day gaps and fill them. Note that this is working on each unique country code. You can save the final_dfs with a list and use pd.concat() if you need to get them back together again.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Country Code': {'2016-01-01': 84,
  '2016-01-02': 84,
  '2016-01-03': 84,
  '2016-01-04': 84,
  '2016-01-05': 84,
  '2019-12-27': 12,
  '2019-12-28': 12,
  '2019-12-29': 12,
  '2019-12-30': 12,
  '2019-12-31': 12},
 'Electric Consumption (MW)': {'2016-01-01': 354642.0,
  '2016-01-02': 376207.0,
  '2016-01-03': 381534.0,
  '2016-01-04': 435561.0,
  '2016-01-05': 447820.0,
  '2019-12-27': 374340.0,
  '2019-12-28': 372761.0,
  '2019-12-29': 379411.0,
  '2019-12-30': 416044.0,
  '2019-12-31': 87519.0}})


# change index value for fake gap
df.index = ['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2019-12-27', '2019-12-28',
               '2019-12-29', '2019-12-30', '2020-01-06']

#convert object dates to datetime
df.index = pd.to_datetime(df.index)


for g in df['Country Code'].unique():
    
    # look at each unique country 
    country_slice = df.loc[df['Country Code'] == g]
    # use timedelta to identify 3 day gaps
    country_slice['Three Day Gap'] = country_slice.index.to_series().diff() > pd.Timedelta('3d')
    # create a new index with previous min and max
    idx = pd.date_range(country_slice.index.min(), country_slice.index.max())

    s = country_slice['Electric Consumption (MW)']

    s.index = pd.DatetimeIndex(s.index)
    # this gives us a series with new rows and nans for the missing dates
    s = s.reindex(idx, fill_value=np.nan)
    # join the old data back to nex index
    country_slice_join = country_slice.join(s, how='outer', lsuffix='L')
    # now we can interpolate as missing dates are new rows
    country_slice_join['interpolate'] = country_slice_join['Electric Consumption (MW)'].interpolate(method='linear', axis=0)
    
    country_slice_join['Country Code'] = country_slice_join['Country Code'].ffill()
    # remove temp columns
    final_df = country_slice_join[['Country Code', 'interpolate']]
    
    final_df.columns = ['Country Code', 'Electric Consumption (MW)']

Example country_slice_join output before final_df:

           Country Code Electric Consumption (MW)L  Three Day Gap   Electric Consumption (MW)   interpolate
2019-12-27  12.0    374340.0    False   374340.0    374340.000000
2019-12-28  12.0    372761.0    False   372761.0    372761.000000
2019-12-29  12.0    379411.0    False   379411.0    379411.000000
2019-12-30  12.0    416044.0    False   416044.0    416044.000000
2019-12-31  12.0    NaN         NaN     NaN         369111.857143
2020-01-01  12.0    NaN         NaN     NaN         322179.714286
2020-01-02  12.0    NaN         NaN     NaN         275247.571429
2020-01-03  12.0    NaN         NaN     NaN         228315.428571
2020-01-04  12.0    NaN         NaN     NaN         181383.285714
2020-01-05  12.0    NaN         NaN     NaN         134451.142857
2020-01-06  12.0    87519.0     True    87519.0     87519.000000

Example final_df output without temporary columns:

         Country Code   Electric Consumption (MW)
2019-12-27  12.0    374340.000000
2019-12-28  12.0    372761.000000
2019-12-29  12.0    379411.000000
2019-12-30  12.0    416044.000000
2019-12-31  12.0    369111.857143
2020-01-01  12.0    322179.714286
2020-01-02  12.0    275247.571429
2020-01-03  12.0    228315.428571
2020-01-04  12.0    181383.285714
2020-01-05  12.0    134451.142857
2020-01-06  12.0    87519.000000