Search code examples
pythonpandasapplyattributeerrormulti-index

Pandas: Multi-index apply function between column and index


I have a multi-index dataframe that look like this:

In[13]: df
Out[13]:
              Last Trade
Date       Ticker           
1983-03-30 CLM83  1983-05-18
           CLN83  1983-06-17
           CLQ83  1983-07-18
           CLU83  1983-08-19
           CLV83  1983-09-16
           CLX83  1983-10-18
           CLZ83  1983-11-18
1983-04-04 CLM83  1983-05-18
           CLN83  1983-06-17
           CLQ83  1983-07-18
           CLU83  1983-08-19
           CLV83  1983-09-16
           CLX83  1983-10-18
           CLZ83  1983-11-18

With two levels for indexes (namely 'Date' and 'Ticker'). I would like to apply a function to the column 'Last Trade' that would let me know how many months separate this 'Last Trade' date from the index 'Date' I found a function that does the calculation:

from calendar import monthrange

def monthdelta(d1, d2):
    delta = 0
    while True:
        mdays = monthrange(d1.year, d1.month)[1]
        d1 += datetime.timedelta(days=mdays)
        if d1 <= d2:
            delta += 1
        else:
            break
    return delta

I tried to apply the following function h but it returns me an AttributeError: 'Timestamp' object has no attribute 'index':

In[14]: h = lambda x: monthdelta(x.index.get_level_values(0),x)

In[15]: df['Last Trade'] = df['Last Trade'].apply(h)

How can I apply a function that would use both a column and an index value?

Thank you for your tips,


Solution

  • Try this instead of your function:

    Option 1

    You get an integer number

    def monthdelta(row):
        trade = row['Last Trade'].year*12 + row['Last Trade'].month
        date = row['Date'].year*12 + row['Date'].month
        return trade - date
    
    df.reset_index().apply(monthdelta, axis=1)
    

    Inspired by PiRsquared:

    df = df.reset_index()
    (df['Last Trade'].dt.year*12 + df['Last Trade'].dt.month) -\
    (df['Date'].dt.year*12 + df['Date'].dt.month)
    

    Option 2

    You get a numpy.timedelta64

    Which can be directly used for other date computations. However, this will be in the form of days, not months, because the number of days in a month are not constant.

    def monthdelta(row):
        return row['Last Trade'] - row['Date']
    
    df.reset_index().apply(monthdelta, axis=1)
    

    Inspired by PiRsquared:

    df = df.reset_index()
    df['Last Trade'] - df['Date']
    

    Option 2 will of course be faster, because it involves less computations. Pick what you like!


    To get your index back: df.index = df[['Date', 'Ticker']]