Search code examples
pythonscipysparse-matrix

Create a Sparse Zero Mean Random Matrix


Does anyone has experience in creating sparse matrix with the non-zero values follows a uniform distribution of [-0.5, 0.5] and has zero mean (zero centered) in python (e.g. using Scipy.sparse)?

I am aware that scipy.sparse package provide a few method on creating random sparse matrix, like 'rand' and 'random'. However I could not achieve what I want with those method. For example, I tried:

import numpy as np
import scipy.sparse as sp

s = np.random.uniform(-0.5,0.5)
W=sp.random(1024, 1024, density=0.01, format='csc', data_rvs=s)

To specifiy my idea: Let say I want the above mentioned matrix which is non-sparse, or dense, I will create it by:

dense=np.random.rand(1024,1024)-0.5

'np.random.rand(1024,1024)' will create a dense uniform matrix with values in [0,1]. To make it zero mean, I centre the matrix by substract it 0.5.

However if I create a sparse matrix, let say:

sparse=sp.rand(1024,1024,density=0.01, format='csc')

The matrix will be having non-zero values in uniform [0,1]. However, if I want to centre the matrix, I cannot simply do 'sparse-=0.5' which will cause all the originally zero entries non-zero after substraction.

So, how can I achieve the same as for the above example for dense matrix on sparse matrix?

Thank you for all of your help!


Solution

  • The data_rvs parameter is expecting a "callable" that takes a size. This isn't exactly obvious from the documentation. This can be done with a lambda as follows:

    import numpy as np
    import scipy.sparse as sp
    
    W = sp.random(1024, 1024, density=0.01, format='csc', 
                  data_rvs=lambda s: np.random.uniform(-0.5, 0.5, size=s))
    

    Then print(W) gives:

      (243, 0)  -0.171300809713
      (315, 0)  0.0739590145626
      (400, 0)  0.188151369316
      (440, 0)  -0.187384896218
        :   :
      (1016, 0) 0.29262088084
      (156, 1)  -0.149881296136
      (166, 1)  -0.490405135834
      (191, 1)  0.188167190147
      (212, 1)  0.0334533020488
      : :
      (411, 1)  0.122330200832
      (431, 1)  -0.0494334160833
      (813, 1)  -0.0076379249885
      (828, 1)  0.462807265425
      : :
      (840, 1021)   0.456423017883
      (12, 1022)    -0.47313075329
       :    :
      (563, 1022)   -0.477190349161
      (655, 1022)   -0.460942546313
      (673, 1022)   0.0930207181126
      (676, 1022)   0.253643616387
       :    :
      (843, 1023)   0.463793903168
      (860, 1023)   0.454427252782
    

    For the newbie, the lambda may look odd - this is just an unnamed function. The sp.random function takes an optional argument data_rvs that defaults to None. When specified, it is expected to be a function that takes a size argument and returns that number of random numbers. A simple function to do this would be:

    def generate_n_uniform_randoms(n):
        return np.uniform(-0.5, 0.5, n)
    

    I don't know the origin of the API, but the shape is not needed as sp.random presumably first figures out which indices will be non-zero, and then it just needs to compute random values for those indices, which is a set of a known size.

    The lambda is just syntactic sugar that allows us to define that function inline in terms of some other function call. We could instead write

    W = sp.random(1024, 1024, density=0.01, format='csc', 
                  data_rvs=generate_n_uniform_randoms)
    

    Actually, this can be a "callable" - some object f for which f(n) returns n random variables. This can be a function, but it can also be an object of a class that implements the __call__(self, n) function. For example:

    class ufoo(object):
    
        def __call__(self, n):
            import numpy
            return numpy.random.uniform(-0.5, 0.5, n)
    
    W = sp.random(1024, 1024, density=0.01, format='csc', 
                  data_rvs=ufoo())
    

    If you need the mean to be exactly zero (within roundoff of course), this can be done by subtracting the mean from the non-zero values, as I mentioned above:

    W.data -= np.mean(W.data)
    

    Then:

    W[idx].mean()
    

    -2.3718641632430623e-18