I've got pandas dataframe with values "along path", with distances, but some of the values are missing. Dataframe looks like this:
Idx AccumDist ValT
0 1 3059 112
1 2 4281 194
2 3 4947 NaN
3 4 5460 NaN
4 5 5811 543
5 6 6021 591
6 7 6289 NaN
7 8 7487 909
8 9 8031 954
9 10 8242 1069
AccumDist
and ValT
is accumulated data, and some of the ValT
are "missing".
What I want to do is to replace NaN in column ValT
with averages to from closest "known" data in this column, weighted by AccumDist
difference.
Because there could be missing ValT
value after another (might be more than 1) I can't use rolling weighted average (or I don't know how to use it), while I still want to average only "closest known" values.
I was thinking about using df.shift()
, maybe on some dataframe subset(s), but I'm not really sure how to do it. Thanks for any help or suggestions.
Use interpolate
based on the accumulated distance. Set that as the index and use method='index'
. Then assign this calculation back to your DataFrame using the like-sized numpy array.
df['ValT_filled'] = df.set_index('AccumDist')['ValT'].interpolate(method='index').to_numpy()
# Idx AccumDist ValT ValT_filled
#0 1 3059 112.0 112.000000
#1 2 4281 194.0 194.000000
#2 3 4947 NaN 345.917647
#3 4 5460 NaN 462.935294
#4 5 5811 543.0 543.000000
#5 6 6021 591.0 591.000000
#6 7 6289 NaN 649.133697
#7 8 7487 909.0 909.000000
#8 9 8031 954.0 954.000000
#9 10 8242 1069.0 1069.000000