python pandas dataframe hierarchical-data data-transform

Dropping grouped rows based on a certain hierarchical column

Suppose I have this pandas dataset:

ID	Question Code
1	Q01
1	Q01-1
1	Q02
2	Q01
2	Q02
2	Q02-1
2	Q02-1-1
2	Q02-2

I want to remove the rows based on certain hierarchical conditions between the values of question codes per ID. For example, Q01-1 is a sub-question when Q01 is answered, soI don't need to keep Q01 anymore since we already have Q01-1. By ID 2, I need to show Q01, Q02-1-1 (since it is a sub-question of Q02-1, which is also one of Q02) and Q02-2 (since it is also another sub-question of Q02).

The desired final result of the table above would be:

ID	Question Code
1	Q01-1
1	Q02
2	Q01
2	Q02-1-1
2	Q02-2

Thanks in advance for the help!

Solution

You could extract the part before the trailing -xxx and use this to identify the levels to drop with boolean indexing. Perform per group with groupby.transform:

out = df[df.groupby('ID')['Question Code']
           .transform(lambda x: ~x.isin(x.str.extract('(.*)-[^-]+$',
                                        expand=False).dropna()))]

Output:

   ID Question Code
1   1         Q01-1
2   1           Q02
3   2           Q01
6   2       Q02-1-1
7   2         Q02-2

Example intermediates for ID 2:

# s.str.extract('(.*)-[^-]+$', expand=False).dropna()
5      Q02
6    Q02-1
7      Q02
Name: Question Code, dtype: object

# ~s.isin(s.str.extract('(.*)-[^-]+$', expand=False).dropna())
3     True
4    False
5    False
6     True
7     True
Name: Question Code, dtype: bool