Search code examples
pythonarraysnumpyscipysparse-matrix

Efficient way to decompress and multiply sparse arrays in python


In a database I have a compressed frequency array. The first value represents the full array index, and the second value represents the frequency. This is compressed to only non-0 values because it is pretty sparse - less than 5% non-0's. I am trying to decompress the array, and then I need the dot product of this array with an array of weights to get a total weight. This is very inefficient with larger arrays. Does anyone have a more efficient way of doing this? For example, should I be using scipy.sparse and just leave the compressedfreqs array as-is? Or maybe is there a more efficient list comprehension I should be doing instead of looping through each item?

Here is a smaller example of what I am doing:

import numpy as np

compressedfreqs = [(1,4),(3,2),(9,8)]
weights = np.array([4,4,4,3,3,3,2,2,2,1])

freqs = np.array([0] * 10)
for item in compressedfreqs:
    freqs[item[0]] = item[1]

totalweight =  np.dot(freqs,weights)
print totalweight

Solution

  • You could use scipy.sparse to handle all that for you:

    >>> import scipy.sparse as sps
    >>> cfq = np.array([(1,4),(3,2),(9,8)])
    >>> cfq_sps = sps.coo_matrix((cfq[:,1], ([0]*len(cfq), cfq[:,0])))
    >>> cfq_sps
    <1x10 sparse matrix of type '<type 'numpy.int32'>'
        with 3 stored elements in COOrdinate format>
    >>> cfq_sps.A # convert to dense array
    array([[0, 4, 0, 2, 0, 0, 0, 0, 0, 8]])
    >>> weights = np.array([4,4,4,3,3,3,2,2,2,1])
    >>> cfq_sps.dot(weights)
    array([30])
    

    If you prefer to not use the sparse module, you can get it to work, albeit probably slower, with a generator expression:

    >>> sum(k*weights[j] for j,k in cfq)
    30