Search code examples
pythonpandasdataframeinterpolation

In pandas: Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row


I found it really hard to articulate this problem, but it is a similar problem to this one.

For example, let's say I have a table like this:

Label Value
A 1
B 2
C Nan
D Nan
E Nan
F 12

It needs to look like this:

Label Value
A 1
B 2
C 3
D 3
E 3
F 3

Used the next available value (12) and divided it by the number of Nan values + 1 (12/4) = 3 and replaced the Nan values AND the original value used to interpolate(12) by 3. It is similar to the previous question but also modifies the original value used for interpolation.

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

test['break'] = np.where(test['Value'].notnull(),1,0)
test['group'] = test['break'].shift().fillna(0).cumsum()
test['Value2'] = test.groupby('group').Value.apply(lambda x: x.fillna( x.max() / len(x)))

for row in range(0,test.shape[0]):
    if test['break'].iloc[row] == 0 and test['break'].iloc[row+1] == 1:
        test.at[row+1, 'Value2'] = test['Value2'].iloc[row]

df.interpolate() is not capable of doing this, and this is what I have so far. It gets the job done, but is not very elegant


Solution

  • Perhaps something like this?

    test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                         'Value': [1, 2, None, None, None, 12,None,None,4]})
    
    tr = test.assign(
        g=(~test['Value'].isna())[::-1].cumsum()
    ).groupby('g')['Value'].transform
    
    df = test.assign(Value=tr('last') / tr('size'))
    
    >>> df
      Label     Value
    0     A  1.000000
    1     B  2.000000
    2     C  3.000000
    3     D  3.000000
    4     E  3.000000
    5     F  3.000000
    6     G  1.333333
    7     H  1.333333
    8     I  1.333333
    

    Explanation

    With the .assign(g=...) we make groups of values that end with a non-NaN values and preceded by zero or more NaN:

    >>> test.assign(
    ...     g=(~test['Value'].isna())[::-1].cumsum()
    ... )
    
      Label  Value  g
    0     A    1.0  4
    1     B    2.0  3
    2     C    NaN  2
    3     D    NaN  2
    4     E    NaN  2
    5     F   12.0  2
    6     G    NaN  1
    7     H    NaN  1
    8     I    4.0  1
    

    Then, we use .groupby('g')['Values'].transform twice: to get the last() value, and to divide it by the size() of the group.