Search code examples
pandasdataframenanfillnalines-of-code

How to replace NaN values with the correspondent month and hour mean value


I'm trying to replace the NaN values from a DataFrame with the correspondent month and hour mean value of this DataFrame.

So let's say I have a DataFrame with generic measures where, randomly, there is no measure in some rows and columns. This DataFrame's first column is the datetime registry with hour frequency.

I've created another DataFrame that calculates the mean value for every hour of each month, but i can't replace the NaN values of the first DataFrame with it's mean correspondent value.

First, let's create a generic DataFrame similar to the explained:

import pandas as pd
import numpy as np

p = 0.1 
columns = ['A','B','C','D','E','F','G','H','I','J']
size = 1000
df = pd.DataFrame(np.random.randint(0,100,size=(size,len(columns))), columns= columns)
mask = np.random.choice([True,False] , size= df.shape, p=[p,1-p])
df = df.mask(mask)
df.insert(0, 'date' ,pd.date_range('2000-01-01 00:00' , periods= size, freq = 'H'))

Then lets create the DataFrame with the means values:

mean_df = df.groupby([df.date.dt.month , df.date.dt.hour]).mean()
mean_df.index.set_names(['month' , 'hour'],inplace=True)
mean_df.reset_index(inplace=True)

I can make it for one column, but i couldn't make it for all the columns:

empty = np.where(df['A'].isna() == True)[0].tolist()

for i in range(len(empty)):
        a = empty[i]
        r = df.columns.get_loc('A')
        df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]

Solution

  • I guess the easiest approach is iterating over every column:

    for c in columns:
        empty = np.where(df[c].isna() == True)[0].tolist()
    
        for i in range(len(empty)):
            a = empty[i]
            r = df.columns.get_loc()
            df.iat[a, r] = mean_df.iat[int(np.where((mean_df.month == df.iat[a,0].month) & (mean_df.hour == df.iat[a,0].hour))[0]),r]