Search code examples
vectorrandomsparse-matrix

How to create a vector where indices of non-zero elements follow a distribution


I need to write a program that will create a vector of size N that will contain K non-zero elements according to the following requirements:

  • Non-zero elements should be mostly concentrated near the middle element (at position N/2) of the vector.
  • Elements at distance D or further from the middle element (on either side) should be zero.
  • As we move away from the middle element, the probability that an element is non-zero should be decreasing.

A rather small example of what I would like to accomplish follows, where N = 40 (middle element is 20), K = 11 non-zero elements, and D = 8. Since D = 8, elements at positions > 20 + 8 = 28 and elements at positions < 20 - 8 = 12 should always be zero. In the zone where non-zeros are allowed (positions from 12 to 28) K = 11 non-zero elements are present. There are more non-zero elements close to position 20 and they become more sparse as we move further away from the middle element.

Position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Vector 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

I have not yet written any code, since I cannot wrap my head around on how to even start. One idea I had was to somehow use the binomial distribution to generate random indices and set the non-zero elements. However, this distribution can give multiple times the same index and hence less than K non-zero elements will be produced. If I use a loop to generate new random numbers until a non-used index is found, will the result still follow a binomial distribution, so that more non-zero elements will be around the middle element?

The programming language that will be used is not that important, but I would prefer something in Matlab, Python, C++ or C, as I am more familiar with them.

I hope someone can provide directions and/or examples.


Solution

  • This is existing functionality in numpy (choice)

    import numpy as np
    from scipy import stats
    
    N = 40
    K = 11
    

    Your vague description of the distribution you want is not adequate, so I'm just going to use a normal probability distribution with a mean of N/2 and a standard deviation of sqrt(N/2).

    center = int(N / 2)
    scale = np.sqrt(N / 2)
    

    Create a probability vector from the probability density function for each possible index (up to N):

    p = stats.norm(loc=center, scale=scale).pdf(np.arange(N))
    

    Make sure it sums to 1:

    p /= np.sum(p)
    

    Initialize a random number generator and call .choice() on the possible indices, with the probability distribution p, setting replace to False:

    rng = np.random.default_rng()
    nz_indices = rng.choice(np.arange(N), size=K, p=p, replace=False)
    
    >>> nz_indices
    array([27, 20, 23, 19, 16, 24, 13, 25, 26, 22, 21])