Search code examples
pythonpandasdataframemulti-index

Selecting explicit cells from pd.DataFrame via .at with MultiIndex


I am having a MultiIndex based pd.DataFrame:

import pandas as pd
data = pd.DataFrame([[2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13]], index=pd.MultiIndex.from_tuples([
            (pd.Timestamp('2019-07-01 23:00:00'), pd.Timestamp('2019-07-01 23:00:00'), 0),
            (pd.Timestamp('2019-07-02 00:00:00'), pd.Timestamp('2019-07-02 00:00:00'), 0),
            (pd.Timestamp('2019-07-02 00:00:00'), pd.Timestamp('2019-07-02 00:00:00'), 0),
            (pd.Timestamp('2019-07-02 01:00:00'), pd.Timestamp('2019-07-02 01:00:00'), 0),
            (pd.Timestamp('2019-07-02 02:00:00'), pd.Timestamp('2019-07-02 02:00:00'), 0),
            (pd.Timestamp('2019-07-02 03:00:00'), pd.Timestamp('2019-07-02 03:00:00'), 0)],
           names=['dt_calc', 'dt_fore', 'positional_index']), columns=['temp', 'temp_2'])

Now I want to replace the cells with a list object (type cast the DataFrame to object before):

idx = data.index[0]
data.at[idx, 'temp'] = [1,2,3]

This will yield to:

ValueError                                Traceback (most recent call last)
/app/generic_model/modules/feature_engineering/lstm_pre_processing.py in <module>
----> 1 data.at[idx, 'temp']

/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   2151             # GH#33041 fall back to .loc
   2152             if not isinstance(key, tuple) or not all(is_scalar(x) for x in key):
-> 2153                 raise ValueError("Invalid call for scalar access (getting)!")
   2154             return self.obj.loc[key]
   2155 

ValueError: Invalid call for scalar access (getting)!

I do not know what the problem is, because using .loc works fine. But with .loc I am not able to replace the cell value. The error message is not really helpful in this case.

I am running pd.__version__: 1.2.2 on python 3.8.


Solution

  • We can still use loc to assign the single cell value by creating the intermediate series having the same index corresponding to the cell that needs to be updated. As a side note, storing complex objects in pandas columns is generally not a good practice as you will loose the benefits of vectorization.

    data.loc[idx, 'temp'] = pd.Series([[1, 2, 3]], index=[idx])
    

                                                                   temp  temp_2
    dt_calc             dt_fore             positional_index                   
    2019-07-01 23:00:00 2019-07-01 23:00:00 0                 [1, 2, 3]       3
    2019-07-02 00:00:00 2019-07-02 00:00:00 0                         4       5
                                            0                         6       7
    2019-07-02 01:00:00 2019-07-02 01:00:00 0                         8       9
    2019-07-02 02:00:00 2019-07-02 02:00:00 0                        10      11
    2019-07-02 03:00:00 2019-07-02 03:00:00 0                        12      13