Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?
I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:
import numpy as np
import scipy.sparse as sp
s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)
To specifiy my idea: Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:
dense=np.random.rand(1024,1024)-0.5
'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.
However if I create a sparse matrix, let say:
sparse=sp.rand(1024,1024,density=0.01, format='csc')
The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.
So, how can I achieve the same as for the above example for dense matrix on sparse matrix?
Thank you for all of your help!
The data_rvs
parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:
import numpy as np
import scipy.sparse as sp
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))
Then print(W)
gives:
(243, 0) -0.171300809713
(315, 0) 0.0739590145626
(400, 0) 0.188151369316
(440, 0) -0.187384896218
: :
(1016, 0) 0.29262088084
(156, 1) -0.149881296136
(166, 1) -0.490405135834
(191, 1) 0.188167190147
(212, 1) 0.0334533020488
: :
(411, 1) 0.122330200832
(431, 1) -0.0494334160833
(813, 1) -0.0076379249885
(828, 1) 0.462807265425
: :
(840, 1021) 0.456423017883
(12, 1022) -0.47313075329
: :
(563, 1022) -0.477190349161
(655, 1022) -0.460942546313
(673, 1022) 0.0930207181126
(676, 1022) 0.253643616387
: :
(843, 1023) 0.463793903168
(860, 1023) 0.454427252782
For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random
function takes an optional argument data_rvs
that defaults to None
. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:
def generate_n_uniform_randoms(n):
return np.uniform(-0.5, 0.5, n)
I don't know the origin of the API, but the shape is not needed as sp.random
presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.
The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=generate_n_uniform_randoms)
Actually, this can be a "callable" - some object f
for which f(n)
returns n
random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n)
function. For example:
class ufoo(object):
def __call__(self, n):
import numpy
return numpy.random.uniform(-0.5, 0.5, n)
W = sp.random(1024, 1024, density=0.01, format='csc',
data_rvs=ufoo())
If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:
W.data -= np.mean(W.data)
Then:
W[idx].mean()
-2.3718641632430623e-18