I'd like to perform SVD (singular value decomposition) via cupy.linalg.svd
over a stack of matrices, computing the SVD per-matrix.
import cupy as cp
arr = cp.random.uniform(size=(1000, 3, 3), dtype=cp.float32)
sing_vals = cp.linalg.svd(arr, compute_uv=False, full_matrices=False)
This gives an error claiming that arr
is not 2D. Apparently, cp.linalg.svd
can only compute the SVD of a single 2D matrix. However, numpy.linalg.svd
computes the SVD always over the last two axes of an array which is much more powerful.
Is there a way to efficiently compute the SVD over a stack of matrices in cupy
?
Better yet, is there a general method to efficiently apply_along_axis
?
The current implementation of CuPy calls cusolverDn<t>gesvd()
, which does not support batch computation. For an efficient batch computation, I suppose CuPy has to invoke a CUDA API that receives batched inputs.
FYI to improve CuPy, cuSOLVER has cusolverDn<t>gesvdjBatched()
and cusolverDn<t>gesvdaStridedBatched()
, which seem possible to be used for batched SVD (of dense general matrices). I don't have an idea on the difference among SVD algorithms. j
stands for Jacobi method and a
stands for approximate.