Search code examples
python-3.xpandasrenamemulti-index

How to change single index value in level 1 in MultiIndex dataframe?


I have this MultiIndex dataframe, df after parsing some text columns for dates with regex.

df.columns
Index(['all', 'month', 'day', 'year'], dtype='object')
        all   month day year
  match             
456 0   2000    1   1   2000
461 0   16      1   1   16
    1   1991    1   1   1991
463 0   25      1   1   25
    1   2014    1   1   2014
465 0   19      1   1   19
    1   1976    1   1   1976
477 0   14      1   1   14
    1   1994    1   1   1994
489 0   35      1   1   35
    1   1985    1   1   1985

I need to keep the rows with years only (2000,1991,2014,1976,1994,1985). Most of these are indexed as 1 at level 1, except for the first one, (456,0). so that I could handle them this way:

df=df.drop(index=0, level=1)

My result should be this.

        all   month day year
  match             
456 0   2000    1   1   2000
461 1   1991    1   1   1991
463 1   2014    1   1   2014
465 1   1976    1   1   1976
477 1   1994    1   1   1994
489 1   1985    1   1   1985

I have tried

df.rename(index={(456,0):(456,1)}, level=1, inplace=True)

which did not seem to do anything.

I could do df1=df.drop((456,1)) and df2=df.drop(index=0, level=1) and then concat them and remove the duplicates, but that does not seem very efficient?

I cant drop the MultiIndex because I will need to append this subset to a bigger dataframe later on. Thank you.


Solution

  • First idea is chain 2 masks by | for bitwise OR:

    df = df[(df.index.get_level_values(1) == 1) | (df.index.get_level_values(0) == 456)]
    print (df)
            all  month  day  year
    456 0  2000      1    1  2000
    461 1  1991      1    1  1991
    463 1  2014      1    1  2014
    465 1  1976      1    1  1976
    477 1  1994      1    1  1994
    489 1  1985      1    1  1985
    

    Another idea if need always first value is possible set array mask by index to True:

    mask = df.index.get_level_values(1) == 1
    mask[0] = True
    
    df = df[mask]
    print (df)
            all  month  day  year
    456 0  2000      1    1  2000
    461 1  1991      1    1  1991
    463 1  2014      1    1  2014
    465 1  1976      1    1  1976
    477 1  1994      1    1  1994
    489 1  1985      1    1  1985
    

    Another out of box solution is filtering not duplicated values by Index.duplicated, working here because first value 456 is unique and for all another values need second rows:

    df1 = df[~df.index.get_level_values(0).duplicated(keep='last')]
    print (df1)
            all  month  day  year
    456 0  2000      1    1  2000
    461 1  1991      1    1  1991
    463 1  2014      1    1  2014
    465 1  1976      1    1  1976
    477 1  1994      1    1  1994
    489 1  1985      1    1  1985