Search code examples
pythonnumpymatrixsparse-matrix

Memory-efficient sparse symmetric matrix calculations


I have to perform a large number of such calculations:

X.dot(Y).dot(Xt)

X = 1 x n matrix

Y = symmetric n x n matrix, each element one of 5 values (e.g. 0, 0.25, 0.5, 0.75, 1)

Xt = n x 1 matrix, transpose of X, i.e. X[np.newaxis].T

X and Y are dense. The problem I have is for large n, I cannot store and use matrix Y due to memory issues. I am limited to using one machine, so distributed calculations are not an option.

It occurred to me that Y has 2 features which theoretically can reduce the amount of memory required to store Y:

  1. Elements of Y are covered by a small list of values.
  2. Y is symmetric.

How can I implement this in practice? I have looked up storage of symmetric matrices, but as far as I am aware all numpy matrix multiplications require "unpacking" the symmetry to produce a regular n x n matrix.

I understand numpy is designed for in-memory calculations, so I'm open to alternative python-based solutions not reliant on numpy. I'm also open to sacrificing speed for memory-efficiency.


Solution

  • UPDATE: I found using a format that crams 3 matrix elements into one byte is actually quite fast. In the example below the speed penalty is less than 2x compared to direct multiplication using @ while the space saving is more than 20x.

    >>> Y = np.random.randint(0, 5, (3000, 3000), dtype = np.int8)
    >>> i, j = np.triu_indices(3000, 1)
    >>> Y[i, j] = Y[j, i]
    >>> values = np.array([0.3, 0.5, 0.6, 0.9, 2.0])
    >>> Ycmp = (np.reshape(Y, (1000, 3, 3000)) * np.array([25, 5, 1], dtype=np.int8)[None, :, None]).sum(axis=1, dtype=np.int8)
    >>> 
    >>> full = values[Y]
    >>> x @ full @ x
    1972379.8153972814
    >>> 
    >>> vtable = values[np.transpose(np.unravel_index(np.arange(125), (5,5,5)))]
    >>> np.dot(np.concatenate([(vtable * np.bincount(row, x, minlength=125)[:, None]).sum(axis=0) for row in Ycmp]), x)
    1972379.8153972814
    >>> 
    >>> timeit('x @ full @ x', globals=globals(), number=100)
    0.7130507210385986
    >>> timeit('np.dot(np.concatenate([(vtable * np.bincount(row, x, minlength=125)[:, None]).sum(axis=0) for row in Ycmp]), x)', globals=globals(), number=100)
    1.3755558349657804
    

    The solutions below are slower and less memory efficient. I'll leave them merely for reference.

    If you can afford half a byte per matrix element, then you can use np.bincount like so:

    >>> Y = np.random.randint(0, 5, (1000, 1000), dtype = np.int8)
    >>> i, j = np.triu_indices(1000, 1)
    >>> Y[i, j] = Y[j, i]
    >>> values = np.array([0.3, 0.5, 0.6, 0.9, 2.0])
    >>> full = values[Y]
    >>> # full would correspond to your original matrix,
    >>> # Y is the 'compressed' version
    >>>
    >>> x = np.random.random((1000,))
    >>>
    >>> # direct method for reference 
    >>> x @ full @ x
    217515.13954751115
    >>> 
    >>> # memory saving version
    >>> np.dot([(values * np.bincount(row, x)).sum() for row in Y], x)
    217515.13954751118
    >>>
    >>> # to save another almost 50% exploit symmetry
    >>> upper = Y[i, j]
    >>> diag = np.diagonal(Y)
    >>> 
    >>> boundaries = np.r_[0, np.cumsum(np.arange(999, 0, -1))]
    >>> (values*np.bincount(diag, x*x)).sum() + 2 * np.dot([(values*np.bincount(upper[boundaries[i]:boundaries[i+1]], x[i+1:],minlength=5)).sum() for i in range(999)], x[:-1])
    217515.13954751115