I have a set of data, where I predict the amount of fuel I need around 10 weeks ahead. I have it all set up in a single dataframe presented as staircase date. This means, the closer I come to the last entry for a week the more accurate the values get. I want to cut all missing values and ignore the exact date so I can just look at my predictions in relation to the distance of the predicted week.
Input dataframe:
Index 2020-01 2020-02 2020-03 2020-04 2020-05 2020-06
1. 10 10 5 0 0 0
2. 0 5 5 10 0 0
3. 0 0 10 4 3 0
4. 0 0 0 1 7 6
Outcome should be:
Index W1 W2 W3
1. 10 10 5
2. 5 5 10
3. 10 4 3
4. 1 7 6
Many Thanks in advance
You could replace the zeros with NaNs and reset the Series per row:
df2 = (
df.replace(0,float('nan'))
.apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
.astype(int)
)
df2.columns = df2.columns.map(lambda x: f'W{x+1}')
output:
W1 W2 W3
1.0 10 10 5
2.0 5 5 10
3.0 10 4 3
4.0 1 7 6
I set 2.
/2020-03
to 0
for this example
You can use ffill
+bfill
to compute a mask:
m = df.ne(0)
m1 = m.where(m).bfill(axis=1)
m2 = m.where(m).ffill(axis=1)
df2 = (
df.where(m1&m2) # internal 0s are those True for both ffill/bfill
.apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
.astype(int)
)
df2.columns = df2.columns.map(lambda x: f'W{x+1}')
output:
W1 W2 W3
1.0 10 10 5
2.0 5 0 10
3.0 10 4 3
4.0 1 7 6