python pandas dataframe missing-data imputation

How to impute missing values in electricity time series data considering the previous and next 2 values on the same day and time previous 2 days?

I have a pandas dataframe like below:

meter IDs	date	00:00	00:30	01:00	01:30	.......	23:00	23:30
1	2020-09-01	0.30	0.40	0.41	0.42	.......	0.47	0.39
1	2020-09-02	0.36	0.39	nan	nan	.......	0.53	0.41
1	2020-09-03	0.26	0.33	0.38	nan	.......	nan	0.45
1	2020-09-04	0.36	0.40	0.41	0.42	.......	0.69	nan
1	2020-09-05	0.30	0.40	0.41	0.42	.......	0.69	0.62
:	:	:	:	:	:	.......	:	:
1	2020-12-31	0.30	0.40	0.41	0.42	.......	0.69	0.62
2	2020-09-01	0.55	0.57	nan	0.60	.......	0.68	0.62
2	2020-09-02	0.61	0.59	0.66	nan	.......	nan	nan
2	2020-09-03	0.64	nan	0.66	0	.......	0.66	0.67
2	2020-09-04	0.58	0.65	0.65	0.71	.......	0.69	0.66
2	2020-09-05	0.49	0.52	0.55	nan	.......	nan	0.62
:	:	:	:	:	:	.......	:	:
2	2020-12-31	0.48	0.51	nan	0.52	.......	0.63	0.61

I want to fill the missing values for the individual meter IDs considering mean of the previous and next 2 values on the same day and same time of previous 2 days?

How can i fix this problem? Thanks in advance.

Solution

pandas' fillna() function is a built-in tool for these scenarios. You can use this reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html. I think you might be able to integrate the mean and previous values. But, for a simple example, if you want to replace all NaN elements in columns ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively, and have a df below:

     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  NaN  NaN NaN  NaN
3  NaN  3.0 NaN  4.0

values = {"A": 0, "B": 1, "C": 2, "D": 3}
df.fillna(value=values)

Then you get:

    A    B    C    D
0  0.0  2.0  2.0  0.0
1  3.0  4.0  2.0  1.0
2  0.0  1.0  2.0  3.0
3  0.0  3.0  2.0  4.0

Or, a custom filling function that iterates over the data frame might help:

import pandas as pd

df.sort_values(by=['meter IDs', 'date'], inplace=True)

def fill_missing(df):
    for meter_id, group in df.groupby('meter IDs'):
        for col in df.columns[2:]:
            for i in range(len(df)):
                if pd.isnull(df.at[i, col]):
                    prev_vals = group[(group['date'] < df.at[i, 'date']) & (group[col].notnull())].tail(2)[col]
                    next_vals = group[(group['date'] > df.at[i, 'date']) & (group[col].notnull())].head(2)[col]
                    mean_val = pd.concat([prev_vals, next_vals]).mean()
                    df.at[i, col] = mean_val

fill_missing(df)
print(df)