Search code examples
pythonpandasgroup-bypivot-tablequantitative-finance

Caclulate decile ranking


Dataset:

Date ticker overnight_return
2017-07-20 CLXT 0.019556
2017-07-21 CLXT 0.039778
2022-02-14 ETNB -0.006186
2022-02-15 ETNB 0.024590

I am testing a hypothesis on the overnight return factor. I want to apply ranking for all unique values in ticker column for every Date, then z-scoring the ranks. I want to rank them in terms of deciles.

Code to get z-score for one date:

import scipy.stats as stats
stats.zscore(equity_daily[equity_daily.Date == "2017-07-20"].overnight_return.rank().dropna().values)

To get the z-score for every day based on the rank of all tickers on that particular day I get the pivoted table then create a new table containing the z-scores:

equity_daily.pivot(columns = "ticker", values = "overnight_return", index = "Date")

But the following error occurred:

ValueError: Index contains duplicate entries, cannot reshape

Desired outcome:

Date ticker overnight_return Decile_rank
2017-07-20 CLXT 0.019556 0
2017-07-21 CLXT 0.039778 2
2022-02-14 ETNB -0.006186 9
2022-02-15 ETNB 0.024590 8

Solution

  • from alphalens.tears import (create_returns_tear_sheet,
                          create_information_tear_sheet,
                          create_turnover_tear_sheet,
                          create_summary_tear_sheet,
                          create_full_tear_sheet,
                          create_event_returns_tear_sheet,
                          create_event_study_tear_sheet)
    
    from alphalens.utils import get_clean_factor_and_forward_returns
    
    def z_score(x):
        """Helper function for Normalization"""
        return stats.zscore(x)
    
    equity_daily["overnight_rank"] = equity_daily.groupby("Date")["overnight_return"].rank(method = "first")
    equity_daily["overnight_normalized"] = equity_daily.groupby("Date")["overnight_rank"].apply(z_score)
    equity_daily["overnight_normalized"] = equity_daily.overnight_normalized.shift(-1)
    equity_daily = equity_daily.dropna()
    
    factor = equity_daily[["Date", "ticker", "overnight_normalized"]].\
                    groupby([pd.Grouper(key = "Date"), "ticker"]).sum()
    
    prices = equity_daily.pivot(columns = "ticker", values = "Close", index = "Date")
    
    factor_data = get_clean_factor_and_forward_returns(
        factor = factor,
        prices = prices,
        groupby = None,
        binning_by_group = False,
        quantiles = 10,
        bins = None,
        periods = (1, 5, 10),
        filter_zscore = 20,
        groupby_labels = None,
        max_loss = 0.35
    )