Search code examples
pythonpandasmissing-data

Backfilling and Forwardfilling NaNs and Zeros


I am trying to back/forward fill the work experience (years) of employees. What I am trying to achieve is:

Employee 200

2019 - 3 yrs, 2018 - 2 yrs, 2017 - 1 yr

Employee 300

Keep as Nan

Employee 400

2018 - 3 yrs, 2017 - 2 yrs

Employee 500

2018 - 6 yrs, 2017 - 5 yrs, 2016 - 4 yrs

I am really struggling to get it to backfill (forwardfill) by increments of -1 (+1). Even trickier if the non-NaN/zero value is in the middle as in the case of employee 500.

df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
                        'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
                        'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
                        'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
                       })

Solution

  • Assuming there's a single nonzero and non-nan experience for each employee, try this

    df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
                            'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
                            'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
                            'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
                           })
    
    # find the last nonzero, non-nan value for each employee
    nonzero = df_test[df_test.Experience.ne(0) & df_test.Experience.notna()].drop_duplicates('Employee', keep='last').reset_index().set_index('Employee')
    # map the difference between experience and index of the nonzero value of the employees to employee column
    # add it to index
    df_test['Experience'] = df_test.index + df_test.Employee.map(nonzero.Experience - nonzero['index'])
    df_test
    

    enter image description here