Search code examples
lambdaapplymulti-indexpandas-groupby

How to use pandas groupby and apply lambda to evaluate a boolean condition


I've been teaching myself python using stock data but I've been stuck on this issue. I'm trying to identify a moving average cross-over. I'm working with daily data in a pandas MultiIndex DataFrame. Below is a snippet of the data structure I'm working with.

import pandas as pd
import numpy as np

data = {'date': pd.Series(['2016-1-4', '2016-1-4', '2016-1-4', 
                           '2016-1-5', '2016-1-5', '2016-1-5', 
                           '2016-1-6', '2016-1-6', '2016-1-6']),
        'ticker': pd.Series(['NYMX', 'EVAR', 'PMV', 
                             'NYMX', 'EVAR', 'PMV', 
                             'NYMX', 'EVAR', 'PMV']),
        'twohundredsma': pd.Series([2.3, 3.58, 0.458, 
                                    2.31, 3.56, 0.459, 
                                    2.32, 3.55, 0.46]),
        'fiveema': pd.Series([2.33, 1.31, 0.54, 
                              2.33, 1.28, 0.54, 
                              2.3, 1.25, 0.54])}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.set_index(['date', 'ticker'], inplace=True)

A cross-over can be identified by taking the difference between the two moving averages and using shift to check for a change in sign from the previous day. I've tested this approach (without groupby) and it works great, providing a True value whenever a crossover has occurred.

However, the issue I'm having is with using the groupby function to apply this function to each stock ticker. My initial approach was to use an apply lambda function. The code below adds the 2 new columns but the "five200bull" column is filled with "nan" values with no errors being thrown.

def five_cross(df):
    df['fiveminus200'] = df['fiveema'] - df['twohundredsma']    

    df['five200bull'] = df.groupby(level='ticker').apply(lambda x: 
      np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1)))

So I tried a different approach where I passed each ticker as a dataframe to a separate function. This approach was much slower when working with a large dataframe, but this didn't work either.

def add_five_bull(df):
    df['five200bull'] = np.sign(df['fiveminus200']) != np.sign(df['fiveminus200'].shift(1))

def five_cross(df):
    df['fiveminus200'] = df['fiveema'] - df['twohundredsma']    

    # group by ticker
    grouped = df.groupby(level='ticker')

    # pass each ticker in a df to function
    for tick, group in grouped:
        add_five_bull(group)

With this approach the "five200bull" column is never appended to the df and I receive the infamous SettingWithCopyWarning. I tried adding df.loc[:, 'fiveminus200'] to the add_five_bull function but other than taking much longer with a large dataset, it didn't appear to have any result.

Obviously there's some flaw in my logic and I would appreciate any help in resolving.


Solution

  • I believe you need parameter group_keys=False for remove appending new level in output - then data are aliged. Also shift return first value NaN per group, so np.sign raise warning:

    RuntimeWarning: invalid value encountered in sign np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1)))

    Solution is replace NaN to some value, e.g. False or True by fillna:

    def five_cross(df):
        df['fiveminus200'] = df['fiveema'] - df['twohundredsma']    
    
        df['five200bull'] = df.groupby(level='ticker', group_keys=False).apply(lambda x: 
          np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1).fillna(False)))
        return df  
    
    print (five_cross(df))
                       fiveema  twohundredsma  fiveminus200  five200bull
    date       ticker                                                   
    2016-01-04 NYMX       2.33          2.300         0.030         True
               EVAR       1.31          3.580        -2.270         True
               PMV        0.54          0.458         0.082         True
    2016-01-05 NYMX       2.33          2.310         0.020        False
               EVAR       1.28          3.560        -2.280        False
               PMV        0.54          0.459         0.081        False
    2016-01-06 NYMX       2.30          2.320        -0.020         True
               EVAR       1.25          3.550        -2.300        False
               PMV        0.54          0.460         0.080        False
    

    def five_cross(df):
        df['fiveminus200'] = df['fiveema'] - df['twohundredsma']    
    
        df1 = df.groupby(level='ticker').apply(lambda x: 
          np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1).fillna(False)))
        return df1 
    
    print (five_cross(df))
    ticker  date        ticker
    EVAR    2016-01-04  EVAR       True
            2016-01-05  EVAR      False
            2016-01-06  EVAR      False
    NYMX    2016-01-04  NYMX       True
            2016-01-05  NYMX      False
            2016-01-06  NYMX       True
    PMV     2016-01-04  PMV        True
            2016-01-05  PMV       False
            2016-01-06  PMV       False
    Name: fiveminus200, dtype: bool