Search code examples
pythonpandasmulti-index

Pandas query multiindex dataframe based on another single index dataframe


I have two dataframes: Data & Positions.

Data has multiindex: 'Date' and 'Symbol'. Positions has a single index: 'Date' (and has a column with the label 'Symbol'). Both 'Date'-s are DatetimeIndexes.

I want to fetch the 'Open' value from Data to Positions for all the dates (indexes) in Positions.

If I try the following:

positions['Open_price'] = data.loc['1997-02-10','AA'].Open

I get the 'Open' value correctly for the 'AA' Symbol on 1997-02-10 for the 'Open_price' field for the whole Positions dataframe. However, my goal is to get 'Open_price' for the relevant Date (index) and Symbol (column value).

So I try this:

positions['Open_price'] = data.loc[positions.index,positions.Symbol].Open

Then I get an error:

KeyError: "None of [DatetimeIndex(['1999-01-01',  ...\n  '2018-07-30'],\n  
        dtype='datetime64[ns]', length=7151, freq='D')] 
are in the [index]"

NOTE: It is important that Data's 'Date' index has missing values (weekends). Positions' 'Date' index has no missing dates.

How can I make this work?

IMPORTANT:

Accepted answer works but I needed to upgrade pandas version 0.20.x to 0.23.4!


Solution

  • You can use .join() and .rename():

    position.join(data, on=['Date','Symbol']).rename(columns={'Open': 'Open_price'})
    

    Here is a full example:

    data = pd.DataFrame([['08-02-2018', 'NDA', 0.123], ['08-02-2018','DFA', 0.234],
                        ['08-03-2018', 'NFX', 0.451], ['08-04-2018', 'BBA', 0.453]],
                       columns=['Date', 'Symbol', 'Open']).set_index(['Date', 'Symbol'])
    
    position = pd.DataFrame([['08-02-2018', 'NDA'],['08-03-2018', 'NFX'],
                        ['08-04-2018', 'TYA'],['08-04-2018', 'BBA']],
                       columns=['Date', 'Symbol']).set_index(['Date'])
    
    data.index = data.index.set_levels([pd.to_datetime(data.index.levels[0]), data.index.levels[1]])
    position.index = pd.to_datetime(position.index)
    
    position = position.join(data, on=['Date','Symbol']).rename(columns={'Open': 'Open_price'})
    

    Which gives:

    data

                        Open
    Date       Symbol       
    2018-08-02 NDA     0.123
               DFA     0.234
    2018-08-03 NFX     0.451
    2018-08-04 BBA     0.453
    

    position

               Symbol
    Date             
    2018-08-02    NDA
    2018-08-03    NFX
    2018-08-04    TYA
    2018-08-04    BBA
    

    Using the above .join() and printing position gives:

               Symbol   Open
    Date                    
    2018-08-02    NDA  0.123
    2018-08-03    NFX  0.451
    2018-08-04    TYA    NaN
    2018-08-04    BBA  0.453