Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it.
So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a numpy.ndarray, I've trying to fill the gaps of the columns with those arrays with no success.
I mean, the column is something like this:
['Records']
101 21
102 22
103 23
104 24
106 NaN
107 NaN
108 NaN
109 NaN
110 NaN
111 29
112 30
The array is:
y_pred = [25, 26, 27, 28]
So, fillna doesn't handle numpy arrays to do the job, and my attempts were set the array as dict, pandas column, etc. but nothing worked.
Also, the other issue is the lenght of the array which always will be different from the original column.
I appreciate your insights.
First is necessary same number of missing values like length of array, if want replace all missing values by all values of array:
#added value
y_pred = [25, 26, 27, 28, 30]
m = df['Records'].isna()
df.loc[m, 'Records'] = y_pred
print (df)
Records
101 21.0
102 22.0
103 23.0
104 24.0
106 25.0
107 26.0
108 27.0
109 28.0
110 30.0
111 29.0
112 30.0
If is possible length not matched create helper Series
with filter by lengths and pass to Series.fillna
:
Here array has length < number of NaNs:
y_pred = [25, 26, 27, 28]
m = df['Records'].isna()
LenNaN = m.sum()
LenArr = len(y_pred)
s = pd.Series(y_pred[:LenNaN], index=df.index[m][:LenArr])
print (s)
106 25
107 26
108 27
109 28
dtype: int64
df['Records'] = df['Records'].fillna(s)
print (df)
Records
101 21.0
102 22.0
103 23.0
104 24.0
106 25.0
107 26.0
108 27.0
109 28.0
110 NaN
111 29.0
112 30.0
Here array has length > number of NaNs:
y_pred = [25, 26, 27, 28, 100, 200, 300]
m = df['Records'].isna()
LenNaN = m.sum()
LenArr = len(y_pred)
s = pd.Series(y_pred[:LenNaN], index=df.index[m][:LenArr])
print (s)
106 25
107 26
108 27
109 28
110 100
dtype: int64
df['Records'] = df['Records'].fillna(s)
print (df)
Records
101 21.0
102 22.0
103 23.0
104 24.0
106 25.0
107 26.0
108 27.0
109 28.0
110 100.0
111 29.0
112 30.0