Search code examples
python-3.xpandasmissing-data

How to fill NaN values based on the top and bottom strings with highest frequency


I have a dataframe of string values with missing values in it. It needs to be populated/filled by the below conditions.

  • From the NaN value index , Check the last 3 rows and next 3 rows and replace the NaN with the most frequent/repeated value out of 6 rows.
  • If there is 2 strings with an equal amount of frequency occurred from the last 3 rows and next 3 rows , replace the NaN with the value that has lowest index out of theses 6 rows.

My DataFrame:

     reading
0       talk
1       kill
2        NaN
3   vertical
4       type
5       kill
6        NaN
7   vertical
8   vertical
9       type
10   durable
11       NaN
12   durable
13  vertical

Expected output:

     reading
0       talk
1       kill
2       kill
3   vertical
4       type
5       kill
6   vertical
7   vertical
8   vertical
9       type
10   durable
11  vertical
12   durable
13  vertical

Here is the minimum reproducible code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'reading':['talk','kill',np.NAN,'vertical','type','kill',np.NAN,'vertical','vertical','type','durable',np.NAN,'durable','vertical']}) 

def filldf(df):
    # Do the logic here
    return df

I am not sure how to approach this problem. Any help will be appreciated !!


Solution

  • If you don't have too many NaN values, you can iterate over the index of NaN "reading" values and simply look for the mode of the surrounding 6 values of it (use iloc to get the first occurrence of multiple modes) and assign the values back to the corresponding "NaN" values

    msk = df['reading'].isna()
    df.loc[msk, 'reading'] = [df.loc[min(0, i-3):i+3, 'reading'].mode().iloc[0] for i in  df.index[msk]]
    

    Output:

         reading
    0       talk
    1       kill
    2       kill
    3   vertical
    4       type
    5       kill
    6   vertical
    7   vertical
    8   vertical
    9       type
    10   durable
    11  vertical
    12   durable
    13  vertical