Search code examples
pythonpandasmask

Fill cols where equal to value, until another value - Pandas


I'm trying to ffill() values in two columns in a df based on a separate column. I'm hoping to continue filling until a condition is met. Using the df below, where Val1 and Val2 are equal to C, I want to fill subsequent rows until strings in Code begin with either ['FR','GE','GA'].

import pandas as pd
import numpy as np

df = pd.DataFrame({   
    'Code' : ['CA','GA','YA','GE','XA','CA','YA','FR','XA'],             
    'Val1' : ['A','B','C','A','B','C','A','B','C'],                 
    'Val2' : ['A','B','C','A','B','C','A','B','C'],
   })

mask = (df['Val1'] == 'C') & (df['Val2'] == 'C')

cols = ['Val1', 'Val2']

df[cols] = np.where(mask, df[cols].ffill(), df[cols])

Intended output:

  Code Val1 Val2
0   CA    A    A
1   GA    B    B
2   YA    C    C
3   GE    A    A
4   XA    B    B
5   CA    C    C
6   YA    C    C
7   FR    B    B
8   XA    C    C

Note: Strings in Code are shortened to be two characters but are longer in my dataset, so I'm hoping to use startswith


Solution

  • The problem is similar to start/stop signal that I have answered before, but couldn't find it. So here's the solution along with other things your mentioned:

    # check for C
    is_C = df.Val1.eq('C') & df.Val2.eq('C')
    
    # check for start substring with regex
    startswith = df.Code.str.match("^(FR|GE|GA)")
    
    # merge the two series
    # startswith is 0, is_C is 1
    mask = np.select((startswith,is_C), (0,1), np.nan)
    
    # update mask with ffill 
    # rows after an `is_C` and before a `startswith` will be marked with 1
    mask = pd.Series(mask, df.index).ffill().fillna(0).astype(bool);
    
    # update the dataframe
    df.loc[mask, ['Val1','Val2']] = 'C'
    

    Output

      Code Val1 Val2
    0   CA    A    A
    1   GA    B    B
    2   YA    C    C
    3   GE    A    A
    4   XA    B    B
    5   CA    C    C
    6   YA    C    C
    7   FR    B    B
    8   XA    C    C