I've been teaching myself python using stock data but I've been stuck on this issue. I'm trying to identify a moving average cross-over. I'm working with daily data in a pandas MultiIndex DataFrame. Below is a snippet of the data structure I'm working with.
import pandas as pd
import numpy as np
data = {'date': pd.Series(['2016-1-4', '2016-1-4', '2016-1-4',
'2016-1-5', '2016-1-5', '2016-1-5',
'2016-1-6', '2016-1-6', '2016-1-6']),
'ticker': pd.Series(['NYMX', 'EVAR', 'PMV',
'NYMX', 'EVAR', 'PMV',
'NYMX', 'EVAR', 'PMV']),
'twohundredsma': pd.Series([2.3, 3.58, 0.458,
2.31, 3.56, 0.459,
2.32, 3.55, 0.46]),
'fiveema': pd.Series([2.33, 1.31, 0.54,
2.33, 1.28, 0.54,
2.3, 1.25, 0.54])}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.set_index(['date', 'ticker'], inplace=True)
A cross-over can be identified by taking the difference between the two moving averages and using shift
to check for a change in sign from the previous day. I've tested this approach (without groupby) and it works great, providing a True
value whenever a crossover has occurred.
However, the issue I'm having is with using the groupby
function to apply this function to each stock ticker. My initial approach was to use an apply
lambda
function. The code below adds the 2 new columns but the "five200bull" column is filled with "nan" values with no errors being thrown.
def five_cross(df):
df['fiveminus200'] = df['fiveema'] - df['twohundredsma']
df['five200bull'] = df.groupby(level='ticker').apply(lambda x:
np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1)))
So I tried a different approach where I passed each ticker as a dataframe to a separate function. This approach was much slower when working with a large dataframe, but this didn't work either.
def add_five_bull(df):
df['five200bull'] = np.sign(df['fiveminus200']) != np.sign(df['fiveminus200'].shift(1))
def five_cross(df):
df['fiveminus200'] = df['fiveema'] - df['twohundredsma']
# group by ticker
grouped = df.groupby(level='ticker')
# pass each ticker in a df to function
for tick, group in grouped:
add_five_bull(group)
With this approach the "five200bull" column is never appended to the df and I receive the infamous SettingWithCopyWarning
. I tried adding df.loc[:, 'fiveminus200']
to the add_five_bull
function but other than taking much longer with a large dataset, it didn't appear to have any result.
Obviously there's some flaw in my logic and I would appreciate any help in resolving.
I believe you need parameter group_keys=False
for remove appending new level in output - then data are aliged. Also shift
return first value NaN
per group, so np.sign
raise warning:
RuntimeWarning: invalid value encountered in sign np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1)))
Solution is replace NaN
to some value, e.g. False
or True
by fillna
:
def five_cross(df):
df['fiveminus200'] = df['fiveema'] - df['twohundredsma']
df['five200bull'] = df.groupby(level='ticker', group_keys=False).apply(lambda x:
np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1).fillna(False)))
return df
print (five_cross(df))
fiveema twohundredsma fiveminus200 five200bull
date ticker
2016-01-04 NYMX 2.33 2.300 0.030 True
EVAR 1.31 3.580 -2.270 True
PMV 0.54 0.458 0.082 True
2016-01-05 NYMX 2.33 2.310 0.020 False
EVAR 1.28 3.560 -2.280 False
PMV 0.54 0.459 0.081 False
2016-01-06 NYMX 2.30 2.320 -0.020 True
EVAR 1.25 3.550 -2.300 False
PMV 0.54 0.460 0.080 False
def five_cross(df):
df['fiveminus200'] = df['fiveema'] - df['twohundredsma']
df1 = df.groupby(level='ticker').apply(lambda x:
np.sign(x['fiveminus200'])!=np.sign(x['fiveminus200'].shift(1).fillna(False)))
return df1
print (five_cross(df))
ticker date ticker
EVAR 2016-01-04 EVAR True
2016-01-05 EVAR False
2016-01-06 EVAR False
NYMX 2016-01-04 NYMX True
2016-01-05 NYMX False
2016-01-06 NYMX True
PMV 2016-01-04 PMV True
2016-01-05 PMV False
2016-01-06 PMV False
Name: fiveminus200, dtype: bool