Search code examples
pythonpandasmulti-index

pd.Series assignment with pd.IndexSlice results in NaN values despite matching indices


I have a multi index series as below.

> data = [['a', 'X', 'u', 1], ['a', 'X', 'v', 2], ['b', 'Y', 'u', 4], ['a', 'Z', 'u', 20]]
> s = pd.DataFrame(data, columns='one two three four'.split()).set_index('one two three'.split()).four
> s
one  two  three
a    X    u         1
          v         2
b    Y    u         4
a    Z    u        20
Name: four, dtype: int64

Then a second series with only one and three as indices:

>>> data2 = [['a', 'u', 3], ['a', 'v', -3]]
>>> s2 = pd.DataFrame(data2, columns='one three four'.split()).set_index('one three'.split()).four
>>> s2
one  three
a    u        3
     v       -3
Name: four, dtype: int64

So, as far as I can see, s2 and s.loc[pd.IndexSlice[:, 'X', :]] are indexed identically.

As such I would expect to be able to do:

>>> s.loc[pd.IndexSlice[:, 'X', :]] = s2

and yet doing so results in NaN values:

>>> s
one  two  three
a    X    u         NaN
          v         NaN
b    Y    u         4.0
a    Z    u        20.0
Name: four, dtype: float64

What is the correct way to do this?


Solution

  • pandas MultiIndexes are sometimes a bit buggy, and this feels like one of those circumstances. If you modify s2.index to match s.index, the assignment works:

    In [155]: s2.index = pd.MultiIndex.from_product([['a'], ['X'], ['u', 'v']], names=['one', 'two', 'three'])
    
    In [156]: s2
    Out[156]:
    one  two  three
    a    X    u        3
              v       -3
    Name: four, dtype: int64
    
    In [157]: s
    Out[157]:
    one  two  three
    a    X    u         1
              v         2
    b    Y    u         4
    a    Z    u        20
    Name: four, dtype: int64
    
    In [158]: s.loc[:, 'X', :] = s2
    
    In [159]: s
    Out[159]:
    one  two  three
    a    X    u         3
              v        -3
    b    Y    u         4
    a    Z    u        20
    Name: four, dtype: int64
    

    Probably worth searching for similar issues in https://github.com/pandas-dev/pandas/issues and adding it as a new one if it's not already there.

    One other option in the meantime is to use .unstack() to reshape your data to do the assignment:

    In [181]: s = s.unstack('two')
    
    In [182]: s['X'].loc[s2.index] = s2
    
    In [183]: s.stack().swaplevel(1,2).sort_index()
    Out[183]:
    one  two  three
    a    X    u         3.0
              v        -3.0
         Z    u        20.0
    b    Y    u         4.0
    dtype: float64