Search code examples
pythonpandasdataframeslicemulti-index

Assign to a whole column of a Pandas DataFrame with MultiIndex?


I have a DataFrame(called midx_df) with a MultiIndex, I want to assign values from a whole column of another DataFrame(called sour_df) with single level index to midx_df.

All of index values of sour_df exist in the top level index of midx_df, I need to specify the level-1 index to add/modify all values of rows with same level-1 index.

For example:

beg_min = pd.to_datetime('2023/03/18 18:50', yearfirst=True)
end_min = pd.to_datetime('2023/03/18 18:53', yearfirst=True)
minutes = pd.date_range(start=beg_min, end=end_min, freq='1min')
actions = ['Buy', 'Sell']
m_index = pd.MultiIndex.from_product([minutes, actions], names=['time', 'action'])
sour_df = pd.DataFrame(index=minutes, columns=['price'])
sour_df.index.rename('time', inplace=True)
sour_df.loc[minutes[0], 'price'] = 'b0'
sour_df.loc[minutes[1], 'price'] = 'b1'
sour_df.loc[minutes[3], 'price'] = 'b2'

midx_df = pd.DataFrame(index=m_index, columns=['price'])
print(midx_df)

midx_df.loc[(beg_min, 'Buy'), 'price'] = 123    # works but only for one row!
midx_df.loc[(end_min, 'Buy')]['price'] = 124    # doesn't work!
print(midx_df)

midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df    # doesn't work!
print(midx_df)

midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df['price']    # doesn't work!
print(midx_df)

#midx_df.loc[(slice(None), 'Buy')]['price'] = sour_df['price']    # doesn't work!
#print(midx_df)

midx_df.loc[pd.IndexSlice[:, 'Buy'], :] = sour_df    # doesn't work!
print(midx_df)

What is the correct way to do that?


Solution

  • This is an interesting question. The problem here is your indexes are not aligned: ('time', 'action') vs 'time' only so pandas can't set correct values.

    You have to reuse the index of midx_df to reindex sour_df. pd.concat can bu used to accomplish this task:

    midx_df.loc[(slice(None), 'Buy'), 'price'] = \
        pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
    print(midx_df)
    
    # Output
                               price
    time                action      
    2023-03-18 18:50:00 Buy       b0
                        Sell     NaN
    2023-03-18 18:51:00 Buy       b1
                        Sell     NaN
    2023-03-18 18:52:00 Buy      NaN
                        Sell     NaN
    2023-03-18 18:53:00 Buy       b2
                        Sell     NaN
    

    Or use pd.MultiIndex.from_product:

    midx_df.loc[(slice(None), 'Buy'), 'price'] = \
        sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))
    

    Details:

    >>> midx_df.loc[(slice(None), 'Buy'), 'price']
    time                 action
    2023-03-18 18:50:00  Buy       NaN
    2023-03-18 18:51:00  Buy       NaN
    2023-03-18 18:52:00  Buy       NaN
    2023-03-18 18:53:00  Buy       NaN
    Name: price, dtype: object
    
    >>> pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
                               price
    time                action      
    2023-03-18 18:50:00 Buy       b0
    2023-03-18 18:51:00 Buy       b1
    2023-03-18 18:52:00 Buy      NaN
    2023-03-18 18:53:00 Buy       b2
    
    >>> sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))
                            price
    time                         
    2023-03-18 18:50:00 Buy    b0
    2023-03-18 18:51:00 Buy    b1
    2023-03-18 18:52:00 Buy   NaN
    2023-03-18 18:53:00 Buy    b2
    

    Now the indexes are well aligned to set values.