Search code examples
pythonnumpyoptimization

Fast way to convert upper triangular matrix into symmetric matrix


I have an upper-triangular matrix of np.float64 values, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 0.,  5.,  6.,  7.],
       [ 0.,  0.,  8.,  9.],
       [ 0.,  0.,  0., 10.]])

I would like to convert this into the corresponding symmetric matrix, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 2.,  5.,  6.,  7.],
       [ 3.,  6.,  8.,  9.],
       [ 4.,  7.,  9., 10.]])

The conversion can be done in place, or as a new matrix. I would like it to be as fast as possible. How can I do this quickly?


Solution

  • This is the fastest routine I've found so far that doesn't use Cython or a JIT like Numba. I takes about 1.6 μs on my machine to process a 4x4 array (average time over a list of 100K 4x4 arrays):

    inds_cache = {}
    
    def upper_triangular_to_symmetric(ut):
        n = ut.shape[0]
        try:
            inds = inds_cache[n]
        except KeyError:
            inds = np.tri(n, k=-1, dtype=np.bool)
            inds_cache[n] = inds
        ut[inds] = ut.T[inds]
    

    Here are some other things I've tried that are not as fast:

    The above code, but without the cache. Takes about 8.3 μs per 4x4 array:

    def upper_triangular_to_symmetric(ut):
        n = ut.shape[0]
        inds = np.tri(n, k=-1, dtype=np.bool)
        ut[inds] = ut.T[inds]
    

    A plain Python nested loop. Takes about 2.5 μs per 4x4 array:

    def upper_triangular_to_symmetric(ut):
        n = ut.shape[0]
        for r in range(1, n):
            for c in range(r):
                ut[r, c] = ut[c, r]
    

    Floating point addition using np.triu. Takes about 11.9 μs per 4x4 array:

    def upper_triangular_to_symmetric(ut):
        ut += np.triu(ut, k=1).T
    

    Numba version of Python nested loop. This was the fastest thing I found (about 0.4 μs per 4x4 array), and was what I ended up using in production, at least until I started running into issues with Numba and had to revert back to a pure Python version:

    import numba
    
    @numba.njit()
    def upper_triangular_to_symmetric(ut):
        n = ut.shape[0]
        for r in range(1, n):
            for c in range(r):
                ut[r, c] = ut[c, r]
    

    Cython version of Python nested loop. I'm new to Cython so this may not be fully optimized. Since Cython adds operational overhead, I'm interested in hearing both Cython and pure-Numpy answers. Takes about 0.6 μs per 4x4 array:

    cimport numpy as np
    cimport cython
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    def upper_triangular_to_symmetric(np.ndarray[np.float64_t, ndim=2] ut):
        cdef int n, r, c
        n = ut.shape[0]
        for r in range(1, n):
            for c in range(r):
                ut[r, c] = ut[c, r]