Search code examples
pythondataframegroup-bytransformmask

Set a boolean Mask for a Group with a Condition


If the Value '2235' is in column 'Age', the associated Group in the Column 'Name' should be set for all true in a new Column.

My Test-Dataframe is:

import pandas as pd

# intialise data of lists.
data = {'PERNR':[111111, 111111, 111111, 111111, 111111, 111111, 111111, 222222, 222222, 222222, 222222, 222222, 222222],
        'Name':['11.11.2024', '11.11.2024', '11.11.2024', '11.11.2024', '14.11.2024', '14.11.2024', '14.11.2024',
                '20.11.2024', '20.11.2024', '20.11.2024', '11.11.2024', '11.11.2024', '11.11.2024'],
        'Age':['', '', '', '', '', '2035', '', '', '', '', '', '', '2035']}
df = pd.DataFrame(data)

I tried this Solution:

df['new'] = df['Age'].eq('2035').groupby(df['Name']).transform('any')

Out[400]: 
     PERNR        Name   Age    new
0   111111  11.11.2024         True
1   111111  11.11.2024         True
2   111111  11.11.2024         True
3   111111  11.11.2024         True
4   111111  14.11.2024         True
5   111111  14.11.2024  2035   True
6   111111  14.11.2024         True
7   222222  20.11.2024        False
8   222222  20.11.2024        False
9   222222  20.11.2024        False
10  222222  11.11.2024         True
11  222222  11.11.2024         True
12  222222  11.11.2024  2035   True


But it should be:

Out[400]: 
     PERNR        Name   Age    new
0   111111  11.11.2024        False
1   111111  11.11.2024        False
2   111111  11.11.2024        False
3   111111  11.11.2024        False
4   111111  14.11.2024         True
5   111111  14.11.2024  2035   True
6   111111  14.11.2024         True
7   222222  20.11.2024        False
8   222222  20.11.2024        False
9   222222  20.11.2024        False
10  222222  11.11.2024         True
11  222222  11.11.2024         True
12  222222  11.11.2024  2035   True

After many other attempts, I don't come to a solution.

Thank's for any Help


Solution

  • You should group by both PERNR and Name columns:

    df['new'] = df['Age'].eq('2035').groupby([df["PERNR"], df["Name"]]).transform('any')