I know that this question has been asked before, but the suggested solutions that I found to not work for me. Maybe I am trying to do something that is simply not possible, but let me explain.
I have a time-series data
that has some values of 0
. I would like to interpolate the zeros in data
using pandas.DataFrame.interpolate
.
The code:
import pandas as pd
import numpy as np
data = [0, -1.31527, -2.25448, -0.965348, -1.11168, -0.0506046, -0.605522,
2.01337, 0, 0, 2.41931, 0.821425, 0.402411, 0]
df = pd.DataFrame(data=data) # Data to pandas dataframe
df.replace(to_replace=0, value=np.nan, inplace=True) # Replace 0 by nan
ip = df.interpolate(method="nearest", order=3, limit=None,
limit_direction=None)
print(ip)
The result of print(ip)
:
0
0 NaN
1 -1.315270
2 -2.254480
3 -0.965348
4 -1.111680
5 -0.050605
6 -0.605522
7 2.013370
8 2.013370
9 2.419310
10 2.419310
11 0.821425
12 0.402411
13 NaN
The problem: Pandas does not interpolate the first and last value of data
, but leaves them as zeros. I tried all options of pandas.DataFrame.interpolate
out forward and back, but it does not seem to work interpolating the first and last zero of data
. Is this simply impossible via Pandas or am I doing something wrong?
What you want is an extrapolation, you need to decide on how to do this.
You can ffill
/bfill
:
ip = (df.interpolate(method="nearest", order=3, limit=None,
limit_direction='both')
.ffill().bfill()
)
Output:
0
0 -1.315270
1 -1.315270
2 -2.254480
3 -0.965348
4 -1.111680
5 -0.050605
6 -0.605522
7 2.013370
8 2.013370
9 2.419310
10 2.419310
11 0.821425
12 0.402411
13 0.402411
Or use a spline:
ip = (df.interpolate(method="nearest", order=3, limit=None,
limit_direction=None)
.fillna(
df.interpolate(method="spline", order=3, limit=None,
limit_direction='both')
)
)
Output:
0
0 -0.585237
1 -1.315270
2 -2.254480
3 -0.965348
4 -1.111680
5 -0.050605
6 -0.605522
7 2.013370
8 2.013370
9 2.419310
10 2.419310
11 0.821425
12 0.402411
13 -1.951716
Output: