I found it really hard to articulate this problem, but it is a similar problem to this one.
For example, let's say I have a table like this:
Label | Value |
---|---|
A | 1 |
B | 2 |
C | Nan |
D | Nan |
E | Nan |
F | 12 |
It needs to look like this:
Label | Value |
---|---|
A | 1 |
B | 2 |
C | 3 |
D | 3 |
E | 3 |
F | 3 |
Used the next available value (12) and divided it by the number of Nan values + 1 (12/4) = 3 and replaced the Nan values AND the original value used to interpolate(12) by 3. It is similar to the previous question but also modifies the original value used for interpolation.
test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
'Value': [1, 2, None, None, None, 12,None,None,4]})
test['break'] = np.where(test['Value'].notnull(),1,0)
test['group'] = test['break'].shift().fillna(0).cumsum()
test['Value2'] = test.groupby('group').Value.apply(lambda x: x.fillna( x.max() / len(x)))
for row in range(0,test.shape[0]):
if test['break'].iloc[row] == 0 and test['break'].iloc[row+1] == 1:
test.at[row+1, 'Value2'] = test['Value2'].iloc[row]
df.interpolate() is not capable of doing this, and this is what I have so far. It gets the job done, but is not very elegant
Perhaps something like this?
test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
'Value': [1, 2, None, None, None, 12,None,None,4]})
tr = test.assign(
g=(~test['Value'].isna())[::-1].cumsum()
).groupby('g')['Value'].transform
df = test.assign(Value=tr('last') / tr('size'))
>>> df
Label Value
0 A 1.000000
1 B 2.000000
2 C 3.000000
3 D 3.000000
4 E 3.000000
5 F 3.000000
6 G 1.333333
7 H 1.333333
8 I 1.333333
With the .assign(g=...)
we make groups of values that end with a non-NaN values and preceded by zero or more NaN:
>>> test.assign(
... g=(~test['Value'].isna())[::-1].cumsum()
... )
Label Value g
0 A 1.0 4
1 B 2.0 3
2 C NaN 2
3 D NaN 2
4 E NaN 2
5 F 12.0 2
6 G NaN 1
7 H NaN 1
8 I 4.0 1
Then, we use .groupby('g')['Values'].transform
twice: to get the last()
value, and to divide it by the size()
of the group.