Search code examples
pythonpandasdataframeinterpolationrolling-computation

Stuck at interpolating with a rolling window (Python, Pandas, DataFrame)


I am attempting to interpolate the following time-series data, sample: enter image description here

For example, the column with the highlighted cell (col name: -6.704) has a number of 0s. I want to interpolate by having a rolling window of size 4 (to see ahead up to -6.672 in this case), and fill any 0.0s in the current column (in this case, -6.704) with the first data value to the right of that column that is not 0.0 in the equivalent row within the rolling window. If all the four adjacent cells in the row also have 0.0s, (eg, rows 4,5,7,10), then the value remains 0.0.

The result for column -6.704 should therefore be: col values: 33.33.., 16.66.., 33.33.., 0.0.., 0.0.., 41.11.., 0.00, 11.11..,16.67.., 0.00, 16.67.

I experimented a bit with the rolling method in Pandas (size 4, freq. 1), but the examples I encounter are all related to ruling mean.

Would anyone have any ideas on how best to solve this please? Or perhaps point me to the right direction?

Thank you

I tried the pandas rolling method, but I am unsure which parameters to use and whether it is indeed the correct method for such a problem.


Solution

  • How about:

    dfnew = df.replace(0, float('nan')).bfill(axis=1, limit=4).fillna(0)
    

    Example

    First, let's build a simple, reproducible setup:

    import numpy as np
    
    n, m = 5, 10
    np.random.seed(0)
    val = np.random.normal(size=(n, m))
    for i in range(1, n):
        val[i, 0:i] = 0
        val[i-1, 4:5+i] = 0
    df = pd.DataFrame(val)
    

    Side note: use styler to highlight zeros:

    # to highlight 0s in red:
    def zero_red(x):
        return 'color:red; font-weight:bold' if x == 0 else None
    
    df.style.applymap(zero_red).format(precision=2)
    

    Then:

    dfnew = df.replace(0, float('nan')).bfill(axis=1, limit=4).fillna(0)
    
    dfnew.style.applymap(zero_red).format(precision=2)
    

    Note the cell at .iloc[3, 4] is not changed, as the closest non-zero value to the right is too far.