I searched a lot here for an answer that could solve this but couldn't find. The desired result is to fill only gaps when the extremities are equal values, limited to lengths of 4 values:
My dataset:
0 NaN
1 NaN
2 NaN
3 5.0
4 5.0
5 NaN
6 NaN
7 5.0
8 6.0
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 5.0
16 5.0
17 NaN
18 NaN
19 6.0
20 6.0
21 NaN
22 NaN
23 NaN
24 NaN
25 5.0
26 NaN
27 NaN
28 NaN
29 NaN
30 NaN
31 NaN
32 NaN
33 5.0
34 NaN
35 NaN
The desired result (fill only gaps when the extremities are equal values, limited for gaps of length of 4):
0 NaN # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
1 NaN # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
2 NaN # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
3 5.0 # Original dataset
4 5.0 # Original dataset
5 5.0 # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
6 5.0 # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
7 5.0 # Original dataset
8 6.0 # Original dataset
9 NaN # Not filled since the gap starts with 6 and ends with 5
10 NaN .
11 NaN .
12 NaN .
13 NaN .
14 NaN # Not filled since the gap starts with 6 and ends with 5
15 5.0 # Original dataset
16 5.0 # Original dataset
17 NaN # Not filled since the gap starts with 5 and ends with 6
18 NaN # Not filled since the gap starts with 5 and ends with 6
19 6.0 # Original dataset
20 6.0 # Original dataset
21 NaN # Not filled since the gap starts with 6 and ends with 5
22 NaN .
23 NaN .
24 NaN # Not filled since the gap starts with 6 and ends with 5
25 5.0 # Original dataset
26 5.0 # Filled since the gap starts with 5 and ends with 5
27 5.0 # Filled since the gap starts with 5 and ends with 5
28 5.0 # Filled since the gap starts with 5 and ends with 5
29 5.0 # Filled since the gap starts with 5 and ends with 5
30 NaN # Not filled since maximum gap is 4
31 NaN # Not filled since maximum gap is 4
32 NaN # Not filled since maximum gap is 4
33 5.0 # Original dataset
34 NaN # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)
35 NaN # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)
We can use boolean masking and cumsum
to identify the blocks of NaN
values that starts and ends with the same value, then group the column on these blocks and forward fill with limit of 4
s = df['col']
m = s.notna()
s.mask(s[m] != s[m].shift(-1)).groupby(m.cumsum()).ffill(limit=4).fillna(s)
0 NaN
1 NaN
2 NaN
3 5.0
4 5.0
5 5.0
6 5.0
7 5.0
8 6.0
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 5.0
16 5.0
17 NaN
18 NaN
19 6.0
20 6.0
21 NaN
22 NaN
23 NaN
24 NaN
25 5.0
26 5.0
27 5.0
28 5.0
29 5.0
30 NaN
31 NaN
32 NaN
33 5.0
34 NaN
35 NaN
Name: col, dtype: float64