Search code examples
pandasfiltergroup-byseries

How to check pair of string values in a column, after grouping the dataframe using ID column?


My Doubt in a Table/Dataframe viewI have a dataframe containing 2 columns: ID and Code.

ID Code Flag
1   A   0
1   C   1
1   B   1
2   A   0
2   B   1
3   A   0
4   C   0

Within each ID, if Code 'A' exists with 'B' or 'C', then it should flag 1.

I tried Groupby('ID') with filter(). but it is not showing the perfect result. Could anyone please help ?


Solution

  • You can do the following:

    First use pd.groupby('ID') and concatenate the codes using 'sum' to create a new column. Then assing the value 1 if a row contains A or B as Code and when the new column contains an A:

    df['s'] = df.groupby('ID').Code.transform('sum')
    df['Flag'] = 0
    df.loc[((df.Code == 'B') | (df.Code == 'C')) & df.s.str.contains('A'), 'Flag'] = 1
    df = df.drop(columns = 's')
    
    

    Output:

       ID Code  Flag
    0   1    A     0
    1   1    C     1
    2   1    B     1
    3   2    A     0
    4   2    B     1
    5   3    A     0
    6   4    C     0