Search code examples
pythonpandasdataframemulti-index

Set single value on pandas multiindex dataframe


With a single-index dataframe, we can use loc to get, set, and change values:

>>> df=pd.DataFrame()
>>> df.loc['A',1]=1
>>> df
     1
A  1.0
>>> df.loc['A',1]=2
>>> df.loc['A',1]
2.0

However, with a multiindex dataframe, loc can get and change values:

>>> df=pd.DataFrame([['A','B',1]])
>>> df=df.set_index([0,1])
>>> df.loc[('A','B'),2]
1
>>> df.loc[('A','B'),2]=3
>>> df.loc[('A','B'),2]
3

but setting them seems to fail:

>>> df=pd.DataFrame()
>>> df.loc[('A','B'),2]=3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 688, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 630, in _get_setitem_indexer
    return self._convert_tuple(key, is_setter=True)
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 754, in _convert_tuple
    idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1212, in _convert_to_indexer
    return self._get_listlike_indexer(key, axis, raise_missing=True)[1]
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "C:\Program Files\Python39\lib\site-packages\pandas\core\indexing.py", line 1308, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['A', 'B'], dtype='object')] are in the [index]"

Why is this, and what is the "right" way to use loc to set a single value in a multiindex dataframe?


Solution

  • This fails because you don't have the correct number of levels in the MultiIndex.

    You need to initialize an empty DataFrame with the correct number of levels, for example using pandas.MultiIndex.from_arrays:

    idx = pd.MultiIndex.from_arrays([[],[]])
    df = pd.DataFrame(index=idx)
    
    df.loc[('A','B'), 2] = 3
    

    Output:

           2
    A B  3.0