Search code examples
pythonpandaslogical-operators

Python Logical Operations as conditions in Pandas


I have a dataframe with columns:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A': [False, True, False, False, False, False, True, True, False, True],
    'B': [True, False, False, False, True, True, False, False, False, False ]
})

df

      A      B
0   False   True
1   True    False
2   False   False
3   False   False
4   False   True
5   False   True
6   True    False
7   True    False
8   False   False
9   True    False

How to identify and mark the first occurrence that has [True - False] after encountering a [False - False] value pair? Every row that satisfies this condition needs to be flagged in a new column.

In the example above, [3 False False] is followed by [6 True False] and also, [8 False False] is followed by [9 True False].

These are the only valid solutions in this example.


Solution

  • You could use:

    # identify start of group
    m1 = df.eq([False, False]).all(axis=1)
    # condition
    m2 = df.eq([True, False]).all(axis=1)
    # form groups
    group = m1.cumsum()
    
    # keep only rows with valid condition and after a start of group
    # get the first value per group
    idx = m2[m2 & (group>0)].groupby(group).idxmax().tolist()
    
    # variant
    # idx = m2.index.to_series()[m2 & (group>0)].groupby(group).first().tolist()
    
    # assign flag
    df.loc[idx, 'flag'] = 'X'
    

    Output:

           A      B flag
    0  False   True  NaN
    1   True  False  NaN
    2  False  False  NaN
    3  False  False  NaN
    4  False   True  NaN
    5  False   True  NaN
    6   True  False    X
    7   True  False  NaN
    8  False  False  NaN
    9   True  False    X
    

    Intermediates:

           A      B     m1     m2  group flag
    0  False   True  False  False      0     
    1   True  False  False   True      0     
    2  False  False   True  False      1     
    3  False  False   True  False      2     
    4  False   True  False  False      2     
    5  False   True  False  False      2     
    6   True  False  False   True      2    X
    7   True  False  False   True      2     
    8  False  False   True  False      3     
    9   True  False  False   True      3    X
    

    Variant without groupby:

    # identify start of groups
    m1 = df.eq([False, False]).all(axis=1)
    # condition
    m2 = (df.eq([True, False]).all(axis=1)
          & m1.cummax()
          )
    # form groups
    group = m1.cumsum()
    
    idx = group[m2].drop_duplicates().index
    
    # assign flag
    df.loc[idx, 'flag'] = 'X'