I have two scipy matrices 'a' and 'b' with boolean values in them. 'a' is way bigger than 'b': 765565 values against just 3 values.
In [211]: a
Out[211]: <388839x8455 sparse matrix of type '<class 'numpy.bool_'>'
with 765565 stored elements in Compressed Sparse Row format>
In [212]: b
Out[212]: <5x3 sparse matrix of type '<class 'numpy.bool_'>'
with 3 stored elements in Compressed Sparse Row format>
But when I check their sizes in terms of memory usage, I see that they are both just 56 bytes:
In [213]: from sys import getsizeof
'Size of a: {}. Size of b: {}'.format(getsizeof(a), getsizeof(b))
Out[213]: 'Size of a: 56. Size of b: 56'
How come these matrices' sizes are the same, while matrix 'a' has to store over 200 thousand times more values than matrix 'b'?
Here is a small demo:
from scipy import sparse
M = sparse.random(10**4, 10**3, .001, 'csr')
def sparse_memory_usage(mat):
try:
return mat.data.nbytes + mat.indptr.nbytes + mat.indices.nbytes
except AttributeError:
return -1
In [140]: sparse_memory_usage(np.random.rand(100, 100))
Out[140]: -1
In [141]: M = sparse.random(10**4, 10**3, .001, 'csr')
In [142]: sparse_memory_usage(M)
Out[142]: 160004
In [144]: M
Out[144]:
<10000x1000 sparse matrix of type '<class 'numpy.float64'>'
with 10000 stored elements in Compressed Sparse Row format>