Search code examples
pythonpandasoptimizationscipyportfolio

SciPy portfolio optimization with industry-level constraints


Trying to optimize a portfolio weight allocation here which maximize my return function by limit risk. I have no problem to find the optimized weight that yields to my return function by simple constraint that the sum of all weight equals to 1, and make the other constraint that my total risk is below target risk.

My problem is, how can I add industry weight bounds for each group?

My code is below:

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import scipy.optimize as sco

dates = pd.date_range('1/1/2000', periods=8)
industry = ['industry', 'industry', 'utility', 'utility', 'consumer']
symbols = ['A', 'B', 'C', 'D', 'E']  
zipped = list(zip(industry, symbols))
index = pd.MultiIndex.from_tuples(zipped)

noa = len(symbols)

data = np.array([[10, 9, 10, 11, 12, 13, 14, 13],
                 [11, 11, 10, 11, 11, 12, 11, 10],
                 [10, 11, 10, 11, 12, 13, 14, 13],
                 [11, 11, 10, 11, 11, 12, 11, 11],
                 [10, 11, 10, 11, 12, 13, 14, 13]])

market_to_market_price = pd.DataFrame(data.T, index=dates, columns=index)

rets = market_to_market_price / market_to_market_price.shift(1) - 1.0
rets = rets.dropna(axis=0, how='all')

expo_factor = np.ones((5,5))
factor_covariance = market_to_market_price.cov()
delta = np.diagflat([0.088024, 0.082614, 0.084237, 0.074648,
                                 0.084237])
cov_matrix = np.dot(np.dot(expo_factor, factor_covariance),
                            expo_factor.T) + delta

def calculate_total_risk(weights, cov_matrix):
    port_var = np.dot(np.dot(weights.T, cov_matrix), weights)
    return port_var

def max_func_return(weights):
    return -np.sum(rets.mean() * weights)

# optimized return with given risk
tolerance_risk = 27
noa = market_to_market_price.shape[1]
cons = ({'type': 'eq', 'fun': lambda x:  np.sum(x) - 1},
         {'type': 'eq', 'fun': lambda x:  calculate_total_risk(x, cov_matrix) - tolerance_risk})
bnds = tuple((0, 1) for x in range(noa))
init_guess = noa * [1. / noa,]
opts_mean = sco.minimize(max_func_return, init_guess, method='SLSQP',
                       bounds=bnds, constraints=cons)


In [88]: rets
Out[88]: 
            industry             utility            consumer
                   A         B         C         D         E
2000-01-02 -0.100000  0.000000  0.100000  0.000000  0.100000
2000-01-03  0.111111 -0.090909 -0.090909 -0.090909 -0.090909
2000-01-04  0.100000  0.100000  0.100000  0.100000  0.100000
2000-01-05  0.090909  0.000000  0.090909  0.000000  0.090909
2000-01-06  0.083333  0.090909  0.083333  0.090909  0.083333
2000-01-07  0.076923 -0.083333  0.076923 -0.083333  0.076923
2000-01-08 -0.071429 -0.090909 -0.071429  0.000000 -0.071429

In[89]: opts_mean['x'].round(3)
Out[89]: array([ 0.233,  0.117,  0.243,  0.165,  0.243])

how can I add such group bound such that sum of 5 assets falling into to below bound?

model = pd.DataFrame(np.array([.08,.12,.05]), index= set(industry), columns = ['strategic'])
model['tactical'] = [(.05,.41), (.2,.66), (0,.16)]
In [85]: model
Out[85]: 
          strategic      tactical
industry       0.08  (0.05, 0.41)
consumer       0.12   (0.2, 0.66)
utility        0.05     (0, 0.16)

I have read this similar post SciPy optimization with grouped bounds but still can't get any clues, can any body help? Thank you.


Solution

  • Firstly, consider using cvxopt, a module designed specifically for convex optimization. I'm not too familiar but an example for an efficient frontier is here.

    Now getting to your question, here's a workaround that applies specifically to the question you posted and uses minimize. (It could be generalized to create more flexibility in input types and user-friendliness, and a class-based implementation would be useful here too.)

    Regarding your question, "how can I add group bounds?", the short answer is that you actually need to do this through the constraints rather than bounds parameter because

    Optionally, the lower and upper bounds for each element in x can also be specified using the bounds argument. [emphasis added]

    This specification doesn't match with what you're trying to do. What the below example does, instead, is to add an inequality constraint separately for the upper and lower bound of each group. The function mapto_constraints returns a list of dicts that is added to your current constraints.

    To begin, here's some example data:

    import pandas as pd
    import numpy as np
    import numpy.random as npr
    npr.seed(123)
    from scipy.optimize import minimize
    
    # Create a DataFrame of hypothetical returns for 5 stocks across 3 industries,
    # at daily frequency over a year.  Note that these will be in decimal
    # rather than numeral form. (i.e. 0.01 denotes a 1% return)
    
    dates = pd.bdate_range(start='1/1/2000', end='12/31/2000')
    industry = ['industry'] * 2 + ['utility'] * 2 + ['consumer']
    symbols = list('ABCDE')
    zipped = list(zip(industry, symbols))
    cols = pd.MultiIndex.from_tuples(zipped)
    returns = pd.DataFrame(npr.randn(len(dates), len(cols)), index=dates, columns=cols)
    returns /= 100 + 3e-3 #drift term
    
    returns.head()
    Out[191]: 
               industry           utility          consumer
                      A        B        C        D        E
    2000-01-03 -0.01484  0.00986 -0.00476  0.00235 -0.00630
    2000-01-04  0.00518  0.00958 -0.01210 -0.00814 -0.01664
    2000-01-05  0.00233 -0.01665 -0.00366  0.00520  0.02058
    2000-01-06  0.00368  0.01253  0.00259  0.00309 -0.00211
    2000-01-07 -0.00383  0.01174  0.00375  0.00336 -0.00608
    

    You can see that the annualized figures "make sense":

    (1 + returns.mean()) ** 252 - 1
    Out[199]: 
    industry  A   -0.05531
              B    0.32455
    utility   C    0.10979
              D    0.14339
    consumer  E   -0.12644
    

    Now for some functions that will be used in the optimization. These are closely modeled after examples from Yves Hilpisch's Python for Finance, chapter 11.

    def logrels(rets):
        """Log of return relatives, ln(1+r), for a given DataFrame rets."""
        return np.log(rets + 1)
    
    def statistics(weights, rets):
        """Compute expected portfolio statistics from individual asset returns.
    
        Parameters
        ==========
        rets : DataFrame
            Individual asset returns.  Use numeral rather than decimal form
        weights : array-like
            Individual asset weights, nx1 vector.
    
        Returns
        =======
        list of (pret, pvol, pstd); these are *per-period* figures (not annualized)
            pret : expected portfolio return
            pvol : expected portfolio variance
            pstd : expected portfolio standard deviation
    
        Note
        ====
        Note that Modern Portfolio Theory (MPT), being a single-period model,
        works with (optimizes using) continuously compounded returns and
        volatility, using log return relatives.  The difference between these and
        more commonly used geometric means will be negligible for small returns.
        """
    
        if isinstance(weights, (tuple, list)):
            weights = np.array(weights)
        pret = np.sum(logrels(rets).mean() * weights)
        pvol = np.dot(weights.T, np.dot(logrels(rets).cov(), weights))
        pstd = np.sqrt(pvol)
        return [pret, pvol, pstd]
    
    # The below are a few convenience functions around statistics() above, needed
    # because scipy minimize must optimize a function that returns a scalar
    
    def port_ret(weights, rets):
        return -1 * statistics(weights=weights, rets=rets)[0]
    
    def port_variance(weights, rets):
        return statistics(weights=weights, rets=rets)[1]
    

    Here is the expected annualized standard deviation of an equal-weight portfolio. I'm just giving this here to use as an anchor in the optimization (the risk_tol parameter).

    statistics([0.2] * 5, returns)[2] * np.sqrt(252) # ew anlzd stdev
    Out[192]: 0.06642120658640735
    

    The next function takes a DataFrame that looks like your model DataFrame and builds constraints for each group. Note that this is pretty inflexible in that you will need to follow the specific format for the returns and model DataFrames you're using now.

    def mapto_constraints(rets, model):
        tactical = model['tactical'].to_dict() # values are tuple bounds
        industries = rets.columns.get_level_values(0)
        group_cons = list()
        for key in tactical:
            if isinstance(industries.get_loc('consumer'), int):
                pos = [industries.get_loc(key)]
            else:
                pos = np.where(industries.get_loc(key))[0].tolist()
            lb = tactical[key][0]
            ub = tactical[key][1] # upper and lower bounds
            lbdict = {'type': 'ineq', 
                      'fun': lambda x: np.sum(x[pos[0]:(pos[-1] + 1)]) - lb}
            ubdict = {'type': 'ineq', 
                      'fun': lambda x: ub - np.sum(x[pos[0]:(pos[-1] + 1)])}
            group_cons.append(lbdict); group_cons.append(ubdict)
        return group_cons
    

    A note on how the constraints are built above:

    Equality constraint means that the constraint function result is to be zero whereas inequality means that it is to be non-negative.

    Lastly, the optimization itself:

    def opt(rets, risk_tol, model, round=3):    
        noa = len(rets.columns)
        guess = noa * [1. / noa,] # equal-weight; needed for initial guess
        bnds = tuple((0, 1) for x in range(noa))
        cons = [{'type': 'eq', 'fun': lambda x: np.sum(x) - 1.},
                {'type': 'ineq', 'fun': lambda x: risk_tol - port_variance(x, rets=rets)}
               ] + mapto_constraints(rets=rets, model=model)
        opt = minimize(port_ret, guess, args=(returns,), method='SLSQP', bounds=bnds, 
                       constraints=cons, tol=1e-10)
        return opt.x.round(round)
    
    model = pd.DataFrame(np.array([.08,.12,.05]), 
                         index= set(industry), columns = ['strategic'])
    model['tactical'] = [(.05,.41), (.2,.66), (0,.16)]
    
    # Set variance threshold equal to the equal-weighted variance
    # Note that I set variance as an inequality rather than equality (i.e.
    # resulting variance should be less than threshold).
    
    opt(returns, risk_tol=port_variance([0.2] * 5, returns), model=model)
    Out[195]: array([ 0.188,  0.225,  0.229,  0.197,  0.16 ])