I need to write a program that will create a vector of size N that will contain K non-zero elements according to the following requirements:
A rather small example of what I would like to accomplish follows, where N = 40 (middle element is 20), K = 11 non-zero elements, and D = 8. Since D = 8, elements at positions > 20 + 8 = 28 and elements at positions < 20 - 8 = 12 should always be zero. In the zone where non-zeros are allowed (positions from 12 to 28) K = 11 non-zero elements are present. There are more non-zero elements close to position 20 and they become more sparse as we move further away from the middle element.
Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Vector | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I have not yet written any code, since I cannot wrap my head around on how to even start. One idea I had was to somehow use the binomial distribution to generate random indices and set the non-zero elements. However, this distribution can give multiple times the same index and hence less than K non-zero elements will be produced. If I use a loop to generate new random numbers until a non-used index is found, will the result still follow a binomial distribution, so that more non-zero elements will be around the middle element?
The programming language that will be used is not that important, but I would prefer something in Matlab, Python, C++ or C, as I am more familiar with them.
I hope someone can provide directions and/or examples.
This is existing functionality in numpy (choice)
import numpy as np
from scipy import stats
N = 40
K = 11
Your vague description of the distribution you want is not adequate, so I'm just going to use a normal probability distribution with a mean of N/2
and a standard deviation of sqrt(N/2)
.
center = int(N / 2)
scale = np.sqrt(N / 2)
Create a probability vector from the probability density function for each possible index (up to N
):
p = stats.norm(loc=center, scale=scale).pdf(np.arange(N))
Make sure it sums to 1:
p /= np.sum(p)
Initialize a random number generator and call .choice()
on the possible indices, with the probability distribution p
, setting replace
to False
:
rng = np.random.default_rng()
nz_indices = rng.choice(np.arange(N), size=K, p=p, replace=False)
>>> nz_indices
array([27, 20, 23, 19, 16, 24, 13, 25, 26, 22, 21])