Search code examples
pythonpandasdataframerolling-computationweighted-average

Rolling weighted average (or similar) filling for missing pandas


I've got pandas dataframe with values "along path", with distances, but some of the values are missing. Dataframe looks like this:

    Idx AccumDist   ValT
0   1   3059    112
1   2   4281    194
2   3   4947    NaN
3   4   5460    NaN
4   5   5811    543
5   6   6021    591
6   7   6289    NaN
7   8   7487    909
8   9   8031    954
9   10  8242    1069

AccumDist and ValT is accumulated data, and some of the ValT are "missing". What I want to do is to replace NaN in column ValT with averages to from closest "known" data in this column, weighted by AccumDist difference.

Because there could be missing ValT value after another (might be more than 1) I can't use rolling weighted average (or I don't know how to use it), while I still want to average only "closest known" values.

I was thinking about using df.shift(), maybe on some dataframe subset(s), but I'm not really sure how to do it. Thanks for any help or suggestions.


Solution

  • Use interpolate based on the accumulated distance. Set that as the index and use method='index'. Then assign this calculation back to your DataFrame using the like-sized numpy array.

    df['ValT_filled'] = df.set_index('AccumDist')['ValT'].interpolate(method='index').to_numpy()
    
    #   Idx  AccumDist    ValT  ValT_filled
    #0    1       3059   112.0   112.000000
    #1    2       4281   194.0   194.000000
    #2    3       4947     NaN   345.917647
    #3    4       5460     NaN   462.935294
    #4    5       5811   543.0   543.000000
    #5    6       6021   591.0   591.000000
    #6    7       6289     NaN   649.133697
    #7    8       7487   909.0   909.000000
    #8    9       8031   954.0   954.000000
    #9   10       8242  1069.0  1069.000000