Dataset:
Date | ticker | overnight_return |
---|---|---|
2017-07-20 | CLXT | 0.019556 |
2017-07-21 | CLXT | 0.039778 |
2022-02-14 | ETNB | -0.006186 |
2022-02-15 | ETNB | 0.024590 |
I am testing a hypothesis on the overnight return factor. I want to apply ranking for all unique values in ticker
column for every Date
, then z-scoring the ranks. I want to rank them in terms of deciles.
Code to get z-score for one date:
import scipy.stats as stats
stats.zscore(equity_daily[equity_daily.Date == "2017-07-20"].overnight_return.rank().dropna().values)
To get the z-score for every day based on the rank of all tickers on that particular day I get the pivoted table then create a new table containing the z-scores:
equity_daily.pivot(columns = "ticker", values = "overnight_return", index = "Date")
But the following error occurred:
ValueError: Index contains duplicate entries, cannot reshape
Desired outcome:
Date | ticker | overnight_return | Decile_rank |
---|---|---|---|
2017-07-20 | CLXT | 0.019556 | 0 |
2017-07-21 | CLXT | 0.039778 | 2 |
2022-02-14 | ETNB | -0.006186 | 9 |
2022-02-15 | ETNB | 0.024590 | 8 |
from alphalens.tears import (create_returns_tear_sheet,
create_information_tear_sheet,
create_turnover_tear_sheet,
create_summary_tear_sheet,
create_full_tear_sheet,
create_event_returns_tear_sheet,
create_event_study_tear_sheet)
from alphalens.utils import get_clean_factor_and_forward_returns
def z_score(x):
"""Helper function for Normalization"""
return stats.zscore(x)
equity_daily["overnight_rank"] = equity_daily.groupby("Date")["overnight_return"].rank(method = "first")
equity_daily["overnight_normalized"] = equity_daily.groupby("Date")["overnight_rank"].apply(z_score)
equity_daily["overnight_normalized"] = equity_daily.overnight_normalized.shift(-1)
equity_daily = equity_daily.dropna()
factor = equity_daily[["Date", "ticker", "overnight_normalized"]].\
groupby([pd.Grouper(key = "Date"), "ticker"]).sum()
prices = equity_daily.pivot(columns = "ticker", values = "Close", index = "Date")
factor_data = get_clean_factor_and_forward_returns(
factor = factor,
prices = prices,
groupby = None,
binning_by_group = False,
quantiles = 10,
bins = None,
periods = (1, 5, 10),
filter_zscore = 20,
groupby_labels = None,
max_loss = 0.35
)