I have a medium sized optimization problem that I have used scipy optimize with the SLSQP method to solve. I am wondering if there is a faster algorithm?
Here is my code:
from scipy.optimize import minimize, Bounds
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(500,5),columns=['pred','var1','var2','var3','weights'])
def obj(x,df=df):
return -(x*df['pred']).sum()
def c1(x,df=df):
return 5-abs((x*df['var1']).sum())
def c2(x,df=df):
return 5-abs((x*df['var2']).sum())
def c3(x,df=df):
return 5-abs((x*df['var3']).sum())
sol = minimize(
fun=obj,
x0=df['weights'],
method='SLSQP',
bounds=Bounds(-0.03, 0.03),
constraints=[{'type': 'ineq', 'fun': c1},{'type': 'ineq', 'fun': c2},{'type': 'ineq', 'fun': c3}],
options={'maxiter': 1000})
As you can see there are three constraints (sometimes 4 or 5) and the objective is to optimize about 500 weights. There are also bounds. The dataframe df is dense, I don't think there is a single zero.
Is the SLSQP method the fastest way at tackling this problem? I am using google colab.
After setting a random seed with np.random.seed(1)
at the beginning of your code snippet to reproduce the results, we can measure the execution time of your code snippet:
In [15]: def foo1():
...: sol = minimize(
...: fun=obj,
...: x0=df['weights'],
...: method='SLSQP',
...: bounds=Bounds(-0.03, 0.03),
...: constraints=[{'type': 'ineq', 'fun': c1},{'type': 'ineq', 'fun': c2},{'type': 'ineq', 'fun': c3}],
...: options={'maxiter': 1000})
...: return sol
...:
In [16]: %timeit foo1()
10.7 s ± 299 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As mentioned in the comments, your constraints can be expressed as linear functions, converting your optimization problem into a linear optimization problem (LP), which can be solved using scipy.optimize.linprog. As a general guideline, if your problem can be formulated as an LP instead of an NLP, it is advisable to pursue the LP approach as it is usually much faster to solve.
Your constraints can be written as | v.T @ x | <= 5
, which represents the absolute value of the dot product (scalar product) of two vectors, v and x. Here, v.T
denotes the transpose of vector v, and @
denotes Python's matrix multiplication operator. It is evident that:
| v1.T @ x | <= 5 <=> -5 <= v1.T @ x <= 5
| v2.T @ x | <= 5 <=> -5 <= v2.T @ x <= 5
| v3.T @ x | <= 5 <=> -5 <= v3.T @ x <= 5
Therefore, your LP problem can be formulated as follows:
min c^T @ x
s.t.
v1.T @ x <= 5
-v1.T @ x <= 5
v2.T @ x <= 5
-v2.T @ x <= 5
v3.T @ x <= 5
-v3.T @ x <= 5
- 0.03 <= x <= 0.03
This can be solved using the following code:
from scipy.optimize import linprog
c = -1 * df['pred'].values
v1 = df['var1'].values
v2 = df['var2'].values
v3 = df['var3'].values
A_ub = np.block([v1, -v1, v2, -v2, v3, -v3]).reshape(6, -1)
b_ub = np.array([5, 5, 5, 5, 5, 5])
bounds = [(-0.03, 0.03)] * c.size
res = linprog(c, A_ub=A_ub, b_ub=b_ub, A_eq=None, b_eq=None, bounds=bounds)
Timing this approach yields:
In [17]: %timeit res = linprog(c, A_ub=A_ub, b_ub=b_ub, A_eq=None, b_eq=None, bounds=bounds)
2.32 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
which is approximately 4300 times faster.