I have a dataframe
of string values with missing values in it. It needs to be populated/filled by the below conditions.
NaN
value index , Check the last 3 rows and next 3 rows and replace the NaN
with the most frequent/repeated value out of 6 rows.NaN
with the value that has lowest index out of theses 6 rows.My DataFrame
:
reading
0 talk
1 kill
2 NaN
3 vertical
4 type
5 kill
6 NaN
7 vertical
8 vertical
9 type
10 durable
11 NaN
12 durable
13 vertical
Expected output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical
Here is the minimum reproducible code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'reading':['talk','kill',np.NAN,'vertical','type','kill',np.NAN,'vertical','vertical','type','durable',np.NAN,'durable','vertical']})
def filldf(df):
# Do the logic here
return df
I am not sure how to approach this problem. Any help will be appreciated !!
If you don't have too many NaN values, you can iterate over the index of NaN "reading" values and simply look for the mode
of the surrounding 6 values of it (use iloc
to get the first occurrence of multiple modes) and assign the values back to the corresponding "NaN" values
msk = df['reading'].isna()
df.loc[msk, 'reading'] = [df.loc[min(0, i-3):i+3, 'reading'].mode().iloc[0] for i in df.index[msk]]
Output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical