My work requires constructing large square matrices of order 1000 as numpy arrays, whose elements are defined analytically as a function of their indices. Right now I create a zero array, and loop over the elements to construct my required array. This by itself takes a hefty time to evaluate. Is there any way to make the construction more efficient or faster, say by using GPU or parallel computing or the like?
Example: I want to construct a (1000, 1000)
matrix J
(which is the Jacobian of the system I'm studying). I have the definition for Jij
given by a function J_ij(i, j, a, b)
, where a
and b
are parameters which are fixed for a given matrix. I begin by creating the array J
of zeros, and iterate along the entries and update them according to the function J_ij
. The full codebase is given here, but following is the only relevant part:
def J_ij(self, i: int, j: int, xi_mat: np.ndarray, eta: np.ndarray):
"""
Calculates the ij-th element of the jacobian* for input pattern eta
Parameters
----------
i,j : int
Indices of the matrix
xi_mat : np.ndarray
Memory pattern matrix of shape (p, n)
eta : np.ndarray
Input pattern of shape (n, )
Returns
-------
float value of ij-th element of J
"""
n = self.n
p = self.p
jij = 0
for mu in range(p):
jij += xi_mat[mu, i]*xi_mat[mu, j]*eta[i]*eta[j]
if i == j:
for k in range(n):
for mu in range(p):
jij -= xi_mat[mu, i]*xi_mat[mu, k]*eta[i]*eta[k]
return jij/n
def jacobian(self, eta: np.ndarray):
"""
Generates the jacobian* for stability analysis
Parameters
----------
eta : np.ndarray
Input pattern
Returns
-------
jacobian matrix as a (n, n) numpy array
"""
n = self.n
J = np.zeros((n, n))
if self.xi_mat is None:
self.init_patterns()
xi_mat = self.xi_mat
for _ in range(n**2):
i = _ // n
j = _ % n
J[i, j] += self.J_ij(i, j, xi_mat, eta)
return J
xi_mat
is a matrix of rows of binary patterns made of 1 and -1. eta
is one such binary pattern.
I tried using cupy instead of numpy, but due to (independent) issues related to my dedicated GPU being not recognized by CUDA in my Arch linux installation, it actually took longer time than numpy.
Your jacobian
function looks a lot like a straightforward matrix multiply, and can be vectorized simply to the following (assuming that xi_mat is a p-by-n matrix and eta is an n-element vector):
M = xi_mat * eta
J = M.T @ M
J -= np.diag(np.sum(J, axis=0))
J /= n
As this uses a NumPy matrix multiply, it should run substantially faster than the manual loop. Timing results on my iPhone for n=1000, p=10, and random xi_mat and eta: