Search code examples
pythonpandasdataframefillna

Fill pandas column forward iteratively, but without using iteration?


I have a pandas data frame with a column where a condition is met based on other elements in the data frame (not shown). Additionally, I have a column that extends the validness one row further with the following rule:

If a valid row is followed directly by ExtendsValid, that row is also valid, even if the underlying valid condition doesnt apply. Continue filling valid forward as long as ExtendsValid is 1

I have illustrated the result in column "FinalValid" (desired result. Doesnt have to be a new column, can also fill Valid forward). Note that rows 8 and 9 in the example also become valid. Also note that row 13 does NOT result in FinalValid, because you need a preceding valid row. Preceding valid row can be Valid or an extended valid row.

So far if I had a problem like that I did a cumbersome multi-step process:

  1. Create a new column for when "Valid" or "ExtendsValid" is true
  2. Create a new column marking the start point for each "sub-series" (a consecutive set of ones)
  3. Number each sub-series
  4. fillna using "group by" for each sub series

I can provide sample code but I am really looking for a totally different, more efficient approach, which of course must be non-iterating as well.

Any ideas would be welcome.

# Valid ExtendsValid FinalValid
1 0 0 0
2 1 0 1
3 0 0 0
4 0 0 0
5 1 0 1
6 0 0 0
7 1 0 1
8 0 1 1
9 0 1 1
10 0 0 0
11 1 0 1
12 0 0 0
13 0 1 0
14 0 0 0

Solution

  • IIUC, you want to ffill the 1s only if there is an uninterrupted series of 1s starting on Valid and eventually continuing on ExtendsValid.

    For this you can use a groupby.cummin:

    df['FinalValid'] = (
     (df['Valid']|df['ExtendsValid'])
     .groupby(df['Valid'].cumsum())
     .cummin()
     )
    

    Output:

    NB. I slightly modified the input on row 3 to better illustrate the logic.

         #  Valid  ExtendsValid  FinalValid
    0    1      0             0           0
    1    2      1             0           1
    2    3      0             0           0
    3    4      0             1           0
    4    5      1             0           1
    5    6      0             0           0
    6    7      1             0           1
    7    8      0             1           1
    8    9      0             1           1
    9   10      0             0           0
    10  11      1             0           1
    11  12      0             0           0