I have a list of random values x_1,x_2, ... x_n and an expected value X.I wanna write a function that takes these 2 things and randomly generates one of the many discrete probability distributions defined on the set {1,2....n} that meets the above mentioned constraint.
Rephrasing the question as generate a vector p of length n such that
0 <= p_i < =1
| p | = 1
p.x = X
Using information I found here I hacked together the below solution, but it doesn't work as the probabilities are not b/w 0 and 1
def normal_random_distribution(data_array, data_len, X_array, expd_vl):
e_mat = np.concatenate(([X_array], [np.ones(data_len)]), axis = 0)
f_mat = np.array([expd_vl, 1])
v_vec = np.linalg.lstsq(e_mat, f_mat,rcond=None)[0]
e_null = sla.null_space(e_mat)
lamda_lower = np.linalg.pinv(e_null) @ (-1 * v_vec)
lamda_upper = np.linalg.pinv(e_null) @ ( 1 - v_vec)
lamda = lamda_lower + random.random()*(lamda_upper - lamda_lower)
sol = v_vec + e_null @ lamda
return sol
How can I generate p ? The sol has l1 norm 1, and |sol * x| = X. I was hoping interpolating b/w lamda_lower and lamda_upper would make it b/w 0 and 1 but it does not. I wanna write the function so that on every call it generates randomly one of the many possible solutions. I realise one of the obvious solutions would be to find an x_i < X and and x_j > X and interpolate b/w the 2, but I want the function to (in theory) be able to generate all the possible distributions.
After trying to create a recursive solution for hours I gave up and coded an iterative brute force ish approach myself.
@numba.jit(nogil = True)
def normal_random_distribution(data_array, X_array, expd_vl):
# print("here")
EPSILON = 0.001
data_len = data_array.size
data_array[:] = 1 / data_len
current_E = np.sum(data_array * X_array)
middle = np.argmax(X_array > expd_vl) - 1
while abs(current_E - expd_vl) > EPSILON:
# print(abs(current_E - expd_vl) / expd_vl)
left_i = random.randint( 0, middle )
rght_i = random.randint( middle + 1, data_len - 1 )
if current_E < expd_vl:
up_lim = min(data_array[left_i] , ( expd_vl - current_E ) / ( X_array[rght_i] - X_array[left_i] ) )
mu = up_lim / 2
sigma = up_lim / 6
x = np.clip(np.random.normal( mu, sigma, 1 ), 0, up_lim)[0]
data_array[left_i] -= x
data_array[rght_i] += x
current_E += x * ( X_array[rght_i] - X_array[left_i] )
else:
up_lim = min(data_array[rght_i] , ( current_E - expd_vl ) / ( X_array[rght_i] - X_array[left_i] ) )
mu = up_lim / 2
sigma = up_lim / 6
x = np.clip(np.random.normal( mu, sigma, 1 ), 0, up_lim)[0]
data_array[left_i] += x
data_array[rght_i] -= x
current_E -= x * ( X_array[rght_i] - X_array[left_i] )