I am trying to wrap my head around multilevel indices.
Specifically, i am trying to get all level 0 indicies that fullfill an 'ANY' criteria. But i can't for the life of me, understand how to get it to work.
For instance, in the dataframe below, we want all indicies that have a '3' in the column 'test_variable_2'
event_name | test_variable_1 | test_variable_2 | test_variable_3 | |
---|---|---|---|---|
subject_id | ||||
1 | pre_event | NaN | 3 | foo |
1 | intra_event | 15 | NaN | bar |
1 | post_event | 30 | NaN | fum |
2 | pre_event | NaN | 2 | foo |
2 | intra_event | 45 | NaN | NaN |
2 | post_event | 60 | NaN | fum |
3 | pre_event | NaN | 3 | foo |
3 | intra_event_1 | 75 | NaN | bar |
3 | intra_event_2 | 90 | NaN | NaN |
3 | post_event | 105 | NaN | fum |
And the result should be:
event_name | test_variable_1 | test_variable_2 | test_variable_3 | |
---|---|---|---|---|
subject_id | ||||
1 | pre_event | NaN | 3 | foo |
1 | intra_event | 15 | NaN | bar |
1 | post_event | 30 | NaN | fum |
3 | pre_event | NaN | 3 | foo |
3 | intra_event_1 | 75 | NaN | bar |
3 | intra_event_2 | 90 | NaN | NaN |
3 | post_event | 105 | NaN | fum |
I thought about using the .groupby function, but I am worried that i loose some of the test-variables that contain several values. The solution i have thus far is to select the indicies that fullfill the boolean mask, and then drop all other indicies, but it seems cumbersome, and not very panda'esque.
I am certain there is a way of harnessing the multilevel indicies. Any pointers in the right direction would help.
Use groupby().transform
:
df[df['test_variable_2'].eq(3).groupby(level=0).transform('any')]
Or you can just use unique
on the valid indexes:
df.loc[df[df.test_variable_2.eq(3)].index.unique()]
Output:
event_name test_variable_1 test_variable_2 test_variable_3
subject_id
1 pre_event NaN 3.0 foo
1 intra_event 15.0 NaN bar
1 post_event 30.0 NaN fum
3 pre_event NaN 3.0 foo
3 intra_event_1 75.0 NaN bar
3 intra_event_2 90.0 NaN NaN
3 post_event 105.0 NaN fum