Search code examples
pythonpandasslicemulti-indexsetvalue

Slicing and assigning values multi-indexed pandas dataframe of unique sequential indices


I want to select and change the value of a dataframe cell. There are 2 indices used for this dataframe: 'datetime' and 'idx'. Both contain labels which are unique and sequential. 'datetime' index has datetime label of datetime type, and 'idx' has integer valued labels.

import numpy as np
import pandas as pd

dt = pd.date_range("2010-10-01 00:00:00", periods=5, freq='H')
d = {'datetime': dt, 'a': np.arange(len(dt))-1,'b':np.arange(len(dt))+1}
df = pd.DataFrame(data=d)
df.set_index(keys='datetime',inplace=True,drop=True)
df.sort_index(axis=0,level='datetime',ascending=False,inplace=True)

df.loc[:,'idx'] = np.arange(0, len(df),1)+5
df.set_index('idx',drop=True,inplace=True,append=True)
print(df)

'Here is the dataframe:

                         a  b
datetime            idx      
2010-10-01 04:00:00 5    3  5
2010-10-01 03:00:00 6    2  4
2010-10-01 02:00:00 7    1  3
2010-10-01 01:00:00 8    0  2
2010-10-01 00:00:00 9   -1  1

'Say I want to get the row where idx=5. How do I do that? I could use this:

print(df.iloc[0])

Then I will get result below:

a    3
b    5
Name: (2010-10-01 04:00:00, 5), dtype: int32

But I want to access and set the value in this cell where idx=5, column='a', by specifying idx value, and column name 'a'. How do I do that?

Please advice.


Solution

  • You can use DatFrame.query() method for querying MultiIndex DFs:

    In [54]: df
    Out[54]:
                             a  b
    datetime            idx
    2010-10-01 04:00:00 5    3  5
    2010-10-01 03:00:00 6    2  4
    2010-10-01 02:00:00 7    1  3
    2010-10-01 01:00:00 8    0  2
    2010-10-01 00:00:00 9   -1  1
    
    In [55]: df.query('idx==5')
    Out[55]:
                             a  b
    datetime            idx
    2010-10-01 04:00:00 5    3  5
    
    In [56]: df.query('idx==5')['a']
    Out[56]:
    datetime             idx
    2010-10-01 04:00:00  5      3
    Name: a, dtype: int32
    

    Or you can use DataFrame.eval() method if you need to set/update some cells:

    In [61]: df.loc[df.eval('idx==5'), 'a'] = 100
    
    In [62]: df
    Out[62]:
                               a  b
    datetime            idx
    2010-10-01 04:00:00 5    100  5
    2010-10-01 03:00:00 6      2  4
    2010-10-01 02:00:00 7      1  3
    2010-10-01 01:00:00 8      0  2
    2010-10-01 00:00:00 9     -1  1
    

    Explanation:

    In [59]: df.eval('idx==5')
    Out[59]:
    datetime             idx
    2010-10-01 04:00:00  5       True
    2010-10-01 03:00:00  6      False
    2010-10-01 02:00:00  7      False
    2010-10-01 01:00:00  8      False
    2010-10-01 00:00:00  9      False
    dtype: bool
    
    In [60]: df.loc[df.eval('idx==5')]
    Out[60]:
                             a  b
    datetime            idx
    2010-10-01 04:00:00 5    3  5
    

    PS if your original MultiIndex doesn't have names, you can easily set them using rename_axis() method:

    df.rename_axis(('datetime','idx')).query(...)
    

    Alternative (bit more expensive) solution - using sort_index() + pd.IndexSlice[]:

    In [106]: df.loc[pd.IndexSlice[:,5], ['a']]
    ...
    skipped
    ...
    KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
    

    so we would need to sort index first:

    In [107]: df.sort_index().loc[pd.IndexSlice[:,5], ['a']]
    Out[107]:
                             a
    datetime            idx
    2010-10-01 04:00:00 5    3