I have a bunch of vectors that are stored in the columns of a matrix U. I have also a matrix V containing column vectors. Each vector in V can either be
The goal is to find an array containing either 1 if the vectors have the same direction, or -1 if they have opposite directions.
Two additional things:
I have found three ways so far to solve this problem:
from timeit import timeit
import numpy as np
# create fake SVD output
n_components = 100
n_samples = 10000
U = np.random.random((n_samples, n_samples - 1))
true_signs = np.sign(np.random.random(n_samples - 1) - 0.5)
Vt = np.multiply(U, true_signs[None, :]).T
# simulate some numerical imprecision
Vt += np.random.random((n_samples - 1, n_samples)) * 0.001
# 3 competing methods
def compute_signs_dot():
VtU = np.dot(Vt[:n_components, :], U[:, :n_components])
signs = np.sign(np.diag(VtU))
np.testing.assert_equal(signs, true_signs[:n_components])
def compute_signs_mul():
diag_VtU = np.multiply(Vt[:n_components, :].T,
U[:, :n_components]).sum(axis=0)
signs = np.sign(diag_VtU)
np.testing.assert_equal(signs, true_signs[:n_components])
def compute_signs_sign():
signs = np.multiply(np.sign(Vt[:n_components, :].T),
np.sign(U[:, :n_components])).sum(axis=0)
signs = np.sign(signs)
np.testing.assert_equal(signs, true_signs[:n_components])
# compare execution times
print("compute_signs_dot: %.3fs" % timeit(compute_signs_dot, number=100))
print("compute_signs_mul: %.3fs" % timeit(compute_signs_mul, number=100))
print("compute_signs_sign: %.3fs" % timeit(compute_signs_sign, number=100))
yields
compute_signs_dot: 2.001s
compute_signs_mul: 0.786s
compute_signs_sign: 1.693s
So it seems that the fastest method so far is to compute the scalar product between pairs of vectors at each column index by multiplying term by term and sum (compute_signs_mul
).
Any other method would be appreciated, faster or with similar speed.
Note: as some readers will have noted, this is postprocessing on output from a truncated SVD in order to transform the singular values into eigenvalues by finding their signs.
We can use np.einsum
-
diag_VtU = np.einsum('ji,ij->j',Vt[:n_components, :], U[:, :n_components])
Alternatively, with np.matmul/@-operator
to get diag_VtU
-
(Vt[:n_components, :][:,None,:] @ (U[:, :n_components].T)[:,:,None])[:,0,0]