I am trying to reduce the time of a function that performs a serie of calculations with two matrix. Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ).
The math correspond with a factorization for the Quadratic assignment problem:
My code is:
delta = 0
for k in xrange(self._tam):
if k != r and k != s:
delta +=
self._data.stream_matrix[r][k] \
* (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
self._data.stream_matrix[s][k] \
* (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
self._data.stream_matrix[k][r] \
* (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
self._data.stream_matrix[k][s] \
* (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
return delta
Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function
ncalls tottime percall cumtime percall filename:lineno(function)
303878 15.712 0.000 15.712 0.000 Heuristic.py:66(deltaC)
I tried to apply map
to the for loop, but because the loop body isn't a function call, it is not possible.
How could I reduce the time?
To answer eickenberg comment:
sol
is a permutation, for example [1,2,3,4]. the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. I am changing only two positions in the original permutation and then call deltaC
, which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). This permutation is made to avoid calculate the entire cost of the neighbor solution. I suppose I can store the values of sol[k,r,s]
in a local variable to avoid looking up its value in each iteration. I do not know if this is what you was asking in your comment.
A minimal working example:
import random
distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]
def deltaC(r, s, S=None):
'''
Difference between C with values i and j swapped
'''
S = [0,1,2,3]
if S is not None:
sol = S
else:
sol = S
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(4):
if k != r and k != s:
delta += (stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
return delta
for _ in xrange(303878):
d = deltaC(random.randint(0,3), random.randint(0,3))
print d
Now I think the better option is use NumPy. I tried with Matrix(), but did not improve the performance.
Well, Finally I was able to reduce the time a bit more combining @TooTone's solution and storing the indexes in a set to avoid the if. The time has dropped from about 18 seconds to 8 seconds. Here is the code:
def deltaC(self, r, s, sol=None):
delta = 0
sol = self.S if sol is None else self.S
sol_r, sol_s = sol[r], sol[s]
stream_matrix = self._data.stream_matrix
distance_matrix = self._data.distance_matrix
indexes = set(xrange(self._tam)) - set([r, s])
for k in indexes:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
In order to reduce the time even more, I think the best way would be write a module.
In the simple example you've given, with for k in xrange(4):
the loop body only executes twice (if r==s
), or three times (if r!=s
) and an initial numpy implementation, below, is slower by a large factor. Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation).
import numpy as np
distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
K = np.array([i for i in xrange(n) if i!=r and i!=s])
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
In this numpy implementation, rather than a for
loop over the elements in K
, the operations are applied across all the elements in K
within numpy. Also, note that your mathematical expression can be simplified. Each term in brackets on the left is the negative of the term in brackets on the right.
This applies to your original code too. For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]])
is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]])
, so you were doing unnecessary computation, and your original code can be optimized without using numpy.
It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension
K = np.array([i for i in xrange(n) if i!=r and i!=s])
Once this is replaced with vectorizing code
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
the numpy function is much faster.
A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is.
The full code is below for reference, partly in case someone else can take it further. (The function deltaC2
is your original code optimized to take account of the way the mathematical expression can be simplified.)
def deltaC(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
delta += \
stream_matrix[r][k] \
* (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
stream_matrix[s][k] \
* (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
stream_matrix[k][r] \
* (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
stream_matrix[k][s] \
* (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
return delta
import numpy as np
def deltaC_np(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
if r==s:
K=np.arange(n-1)
K[r:] += 1
else:
K=np.arange(n-2)
if r<s:
K[r:] += 1
K[s-1:] += 1
else:
K[s:] += 1
K[r-1:] += 1
#K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW
return np.sum(
(stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
* (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
(stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
* (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))
def deltaC2(r, s, sol):
delta = 0
sol_r, sol_s = sol[r], sol[s]
for k in xrange(n):
if k != r and k != s:
sol_k = sol[k]
delta += \
(stream_matrix[r][k] - stream_matrix[s][k]) \
* (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
+ \
(stream_matrix[k][r] - stream_matrix[k][s]) \
* (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
return delta
import time
N=200
elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
distance_matrix=distance_matrix_np.tolist()
stream_matrix=stream_matrix_np.tolist()
sol = range(n-1,-1,-1)
sol_np = np.array(range(n-1,-1,-1))
Is = np.random.randint(0,n-1,4)
Js = np.random.randint(0,n-1,4)
total1 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total1 += deltaC(i,j, sol)
elapsed1 = (time.clock() - start)
start = time.clock()
total2 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total2 += deltaC_np(i,j, sol_np)
elapsed2 = (time.clock() - start)
total3 = 0
start = time.clock()
for reps in xrange(N):
for i in Is:
for j in Js:
total3 += deltaC2(i,j, sol_np)
elapsed3 = (time.clock() - start)
print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
elapsed1s.append(elapsed1)
elapsed2s.append(elapsed2)
elapsed3s.append(elapsed3)
#Check errors of one method against another
#err = 0
#for i in range(min(n,50)):
# for j in range(min(n,50)):
# err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
#print err
import matplotlib.pyplot as plt
plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()
And here is the original graph before optimizing out the list comprehension in deltaC_np