I have this MultiIndex dataframe, df after parsing some text columns for dates with regex.
df.columns
Index(['all', 'month', 'day', 'year'], dtype='object')
all month day year
match
456 0 2000 1 1 2000
461 0 16 1 1 16
1 1991 1 1 1991
463 0 25 1 1 25
1 2014 1 1 2014
465 0 19 1 1 19
1 1976 1 1 1976
477 0 14 1 1 14
1 1994 1 1 1994
489 0 35 1 1 35
1 1985 1 1 1985
I need to keep the rows with years only (2000,1991,2014,1976,1994,1985). Most of these are indexed as 1 at level 1, except for the first one, (456,0). so that I could handle them this way:
df=df.drop(index=0, level=1)
My result should be this.
all month day year
match
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
I have tried
df.rename(index={(456,0):(456,1)}, level=1, inplace=True)
which did not seem to do anything.
I could do df1=df.drop((456,1)) and df2=df.drop(index=0, level=1) and then concat them and remove the duplicates, but that does not seem very efficient?
I cant drop the MultiIndex because I will need to append this subset to a bigger dataframe later on. Thank you.
First idea is chain 2 masks by |
for bitwise OR
:
df = df[(df.index.get_level_values(1) == 1) | (df.index.get_level_values(0) == 456)]
print (df)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
Another idea if need always first value is possible set array mask by index to True
:
mask = df.index.get_level_values(1) == 1
mask[0] = True
df = df[mask]
print (df)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
Another out of box solution is filtering not duplicated values by Index.duplicated
, working here because first value 456
is unique and for all another values need second rows:
df1 = df[~df.index.get_level_values(0).duplicated(keep='last')]
print (df1)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985