Fastest way in numpy to check if vectors are aligned or have opposite direction (truncated SVD post processing)

I have a bunch of vectors that are stored in the columns of a matrix U. I have also a matrix V containing column vectors. Each vector in V can either be

almost identical to its counterpart in U, with numerical approximations
or have an opposite sign, with numerical approximations.

The goal is to find an array containing either 1 if the vectors have the same direction, or -1 if they have opposite directions.

Two additional things:

I am only interested in the first signs, not all (100 out of 10000 in the example below).
V is given in its transposed form (Vt)

I have found three ways so far to solve this problem:

from timeit import timeit
import numpy as np

# create fake SVD output
n_components = 100
n_samples = 10000
U = np.random.random((n_samples, n_samples - 1))
true_signs = np.sign(np.random.random(n_samples - 1) - 0.5)
Vt = np.multiply(U, true_signs[None, :]).T
# simulate some numerical imprecision
Vt += np.random.random((n_samples - 1, n_samples)) * 0.001

# 3 competing methods
def compute_signs_dot():
    VtU = np.dot(Vt[:n_components, :], U[:, :n_components])
    signs = np.sign(np.diag(VtU))
    np.testing.assert_equal(signs, true_signs[:n_components])

def compute_signs_mul():
    diag_VtU = np.multiply(Vt[:n_components, :].T,
                          U[:, :n_components]).sum(axis=0)
    signs = np.sign(diag_VtU)
    np.testing.assert_equal(signs, true_signs[:n_components])

def compute_signs_sign():
    signs = np.multiply(np.sign(Vt[:n_components, :].T),
                        np.sign(U[:, :n_components])).sum(axis=0)
    signs = np.sign(signs)
    np.testing.assert_equal(signs, true_signs[:n_components])

# compare execution times
print("compute_signs_dot: %.3fs" % timeit(compute_signs_dot, number=100))
print("compute_signs_mul: %.3fs" % timeit(compute_signs_mul, number=100))
print("compute_signs_sign: %.3fs" % timeit(compute_signs_sign, number=100))

yields

compute_signs_dot: 2.001s
compute_signs_mul: 0.786s
compute_signs_sign: 1.693s

So it seems that the fastest method so far is to compute the scalar product between pairs of vectors at each column index by multiplying term by term and sum (compute_signs_mul).

Any other method would be appreciated, faster or with similar speed.

Note: as some readers will have noted, this is postprocessing on output from a truncated SVD in order to transform the singular values into eigenvalues by finding their signs.

Solution

We can use np.einsum -

diag_VtU = np.einsum('ji,ij->j',Vt[:n_components, :], U[:, :n_components])

Alternatively, with np.matmul/@-operator to get diag_VtU -

(Vt[:n_components, :][:,None,:] @ (U[:, :n_components].T)[:,:,None])[:,0,0]