Create a Sparse Zero Mean Random Matrix

Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?

I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:

import numpy as np
import scipy.sparse as sp

s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)

To specifiy my idea: Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:

dense=np.random.rand(1024,1024)-0.5

'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.

However if I create a sparse matrix, let say:

sparse=sp.rand(1024,1024,density=0.01, format='csc')

The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.

So, how can I achieve the same as for the above example for dense matrix on sparse matrix?

Thank you for all of your help!

Solution

The data_rvs parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:

import numpy as np
import scipy.sparse as sp

W = sp.random(1024, 1024, density=0.01, format='csc', 
              data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))

Then print(W) gives:

  (243, 0)  -0.171300809713
  (315, 0)  0.0739590145626
  (400, 0)  0.188151369316
  (440, 0)  -0.187384896218
    :   :
  (1016, 0) 0.29262088084
  (156, 1)  -0.149881296136
  (166, 1)  -0.490405135834
  (191, 1)  0.188167190147
  (212, 1)  0.0334533020488
  : :
  (411, 1)  0.122330200832
  (431, 1)  -0.0494334160833
  (813, 1)  -0.0076379249885
  (828, 1)  0.462807265425
  : :
  (840, 1021)   0.456423017883
  (12, 1022)    -0.47313075329
   :    :
  (563, 1022)   -0.477190349161
  (655, 1022)   -0.460942546313
  (673, 1022)   0.0930207181126
  (676, 1022)   0.253643616387
   :    :
  (843, 1023)   0.463793903168
  (860, 1023)   0.454427252782

For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random function takes an optional argument data_rvs that defaults to None. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:

def generate_n_uniform_randoms(n):
    return np.uniform(-0.5, 0.5, n)

I don't know the origin of the API, but the shape is not needed as sp.random presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.

The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write

W = sp.random(1024, 1024, density=0.01, format='csc', 
              data_rvs=generate_n_uniform_randoms)

Actually, this can be a "callable" - some object f for which f(n) returns n random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n) function. For example:

class ufoo(object):

    def __call__(self, n):
        import numpy
        return numpy.random.uniform(-0.5, 0.5, n)

W = sp.random(1024, 1024, density=0.01, format='csc', 
              data_rvs=ufoo())

If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:

W.data -= np.mean(W.data)

Then:

W[idx].mean()

-2.3718641632430623e-18