I have a starting lil_array
of boolean values. (Docs)
Eg dimensions 3,5 and values:
(0, 2) True
(0, 4) True
(1, 0) True
(1, 1) True
(1, 3) True
(2, 0) True
(2, 3) True
So this graphically is a matrix like the following (True represented as 1):
0 0 1 0 1
1 1 0 1 0
1 0 0 1 0
I also have a np.ndarray of the same size as the rows filled with Int values. For this example I will use the following:
arr = np.array([0, -1, 0, 3, 2])
I want to produce the following lil_array (zeroes will not be saved in the sparse of course):
0 0 0 0 2
0 -1 0 3 0
0 0 0 3 0
where each row is the logical and of the initial lil_array corresponding row and arr
.
I know how to do this in several ways transforming the lil_array into a matrix first or the rows into ndarrays or lists but this would loose the efficiency gained by exploiting the sparse property of this matrix. (This is a toy example but my problem involves a way bigger matrix)
How can I produce the output in an efficient and clean way without turning the sparse into a matrix?
First, I would steer you away from lil_array if your primary concern is memory efficiency. LIL uses lists internally, and lists in Python are not very memory efficient. lil_array is mainly useful if you want to create a sparse array but don't know how many non-zero elements it will have. I would steer you toward COO or CSR instead.
Second, what you're looking for is essentially np.where(boolean_array, arr, 0)
, with the exception that you want it to work on sparse arrays without converting them to non-sparse arrays first.
Some googling for "scipy sparse where" finds this thread, which is almost the code we need.
The where1
function is almost what we need, except that it assumes that the second array it's indexing into is a 2D sparse array, when what you have is a 1D dense array. We can fix that by removing the calls to .tocsr()
, and indexing by column only.
Code:
def where1(cond, x):
assert len(x.shape) == 1, "x should be 1d"
# elements of x where cond
row, col = cond.nonzero()
data = np.empty(row.shape, dtype=x.dtype)
zs = scipy.sparse.coo_matrix((data, (row, col)), shape=cond.shape)
xx = x[col]
zs.data[:] = xx
zs.eliminate_zeros()
# zs = zs.tolil()
return zs
(Uncomment zs = zs.tolil()
if you want the result in lil.)
According to a benchmark, this is about 100x faster than @Malcolm's answer for a 100000x10000 matrix with 0.1% nonzero elements.