The following code runs in 45s when using pure Python.
for iteration in range(maxiter):
for node in range(n):
for dest in adjacency_list[node]:
rs[iteration + 1][dest] += beta * rs[iteration][node] / len(adjacency_list[node])
But, by simply initializing rs
as a numpy ndarray instead of a python list of lists, the code runs in 145s. I do not really know why numpy takes 3 times as much time with this array indexing.
My idea was to vectorize as much things as possible, but have only managed to vectorize the multiplication of beta/len(adjacency_list[node])
. This code runs in 77s.
beta_over_out_degree = np.array([beta / len(al) for al in adjacency_list])
for iteration in range(1, maxiter + 1):
r_next = np.full(shape=n, fill_value=(1 - beta) / n)
f = beta_over_out_degree * r
for i in range(n):
r_next[adjacency_list[i]] += f[i]
r = np.copy(r_next)
rs[iteration] = np.copy(r)
The problem is that adjacency_list
is a list of lists with differing column size, with 100 000 rows and 1-15 columns.
A more standard approach with an adjacency matrix, at least as a normal ndarray, is not an option, since for n=100 000 its shape of (n,n) is too big to be allocated to memory.
Is there any way to vectorize using its indexes for numpy advanced indexing(maybe turning it into a numpy ndarray)?
I would also greatly appreciate any other speed tips. Thanks in advance!
EDIT: Thanks to @stevemo I managed to create adjacency_matrix
with csr_matrix
functionality and used it for iterative multiplication. Program now runs in only 2s!
for iteration in range(1, 101):
rs[iteration] += rs[iteration - 1] * adjacency_matrix
If I understand you correctly, this can be done with a one-liner formula using matrix powers of the adjacency matrix.
Based on your original code snippet, it seems you have some network of n
nodes, with the adjacency information stored as a list of lists in adjacency
, and you have a value r
associated with each node such its value at iteration k+1
is beta
times the sum of r
of each of its neighbors at iter k
. (Your loop constructs this in the opposite direction, but same thing.)
If you don't mind reforming your adjacency
list-of-lists into a more standard adjacency matrix, such that A_ij = 1
if ij
are neighbors, 0 otherwise, then you could accomplish the inner two loops with a simple matrix product, r[k+1] = beta * (A @ r[k])
.
And following this logic, r[k+2] = beta * (A @ (beta * (A @ r[k]))) = (beta * A)**2 @ r[k]
or in general,
r[k] = (beta * A)**k @ r[0]
Let's try this on a small network:
# adjacency matrix
A = np.array([
[0, 1, 1, 0, 0],
[1, 0, 1, 0, 0],
[1, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 1, 0]
])
# initial values
n = 5
beta = 0.5
r0 = np.ones(n)
maxiter = 10
# after one iteration
print(beta * (A @ r0))
# [1. 1. 1.5 1. 0.5]
# after 10 iterations
print(np.linalg.matrix_power((beta * A), maxiter) @ r0)
# [2.88574219 2.88574219 3.4921875 1.99414062 0.89257812]