Search code examples
pythonpandasdataframefunctionfillna

How to fill empty NaN data with median values taken from the list based on other columns that are similar in value


It is necessary to replace the empty Nan values (the 'area' column) with the median value from other areas (equal to or less than the 'total' value). For example: Line 2 has the value 'total'==8. Select the table with the values 'total'==8, look for the median for 'area'.median() and write the value (if any). If there is no value, then reduce the 'total' by 1 and search further. And in line 6 there is the value 'total'==59. Therefore, let's take the median of 'total'==56 and the value of 'area'=34.

Here's what the data should look like: result

import pandas as pd
import numpy as np

df = pd.DataFrame({'total': [5, 8, 8, 8, 20, 56, 59], \
'area': [40, 51, 53, np.nan, np.nan, 34, np.nan]})
df

#   total   area
0   5   40.0
1   8   51.0
2   8   53.0
3   8   NaN
4   20  NaN
5   56  34.0
6   59  NaN
result = pd.DataFrame({'total': [5, 8, 8, 8, 20, 56, 59], 'area': [40, 51, 53, 52, 52, 34, 34]})
result

#   total   area
0   5   40
1   8   51
2   8   53
3   8   52
4   20  52
5   56  34
6   59  34

I have created a function, but it does not produce the desired result:

def find_area(total_num, x=1):
    while x > 0:
        y = df.query('total == @total_num')['area'].sum()
        if y > 0:
            return df.query('total == @total_num')['area'].median()
            x=0
            break
        else:
            total_num -= 1

df['area'] = df['area'].fillna(find_area)
df

Solution

  • First compute the median per group (with groupby.median), get rid of the NaNs with dropna, and perform a merge_asof to map the missing values. Use the output to fill the NaN by boolean indexing:

    tmp = df.groupby('total')['area'].median().dropna()
    
    df.loc[df['area'].isna(), 'area'] = pd.merge_asof(
        df['total'].reset_index(), tmp, on='total'
    ).set_index('index')['area']
    

    Output:

       total  area
    0      5  40.0
    1      8  51.0
    2      8  53.0
    3      8  52.0
    4     20  52.0
    5     56  34.0
    6     59  34.0