Search code examples
pythonnumpyrandomscipyprobability

Given a list of random variable and an expected value, how to generate probability distribution in python


I have a list of random values x_1,x_2, ... x_n and an expected value X.I wanna write a function that takes these 2 things and randomly generates one of the many discrete probability distributions defined on the set {1,2....n} that meets the above mentioned constraint.

Rephrasing the question as generate a vector p of length n such that

0 <= p_i < =1

| p | = 1

p.x = X

Using information I found here I hacked together the below solution, but it doesn't work as the probabilities are not b/w 0 and 1

def normal_random_distribution(data_array, data_len, X_array, expd_vl):
    e_mat = np.concatenate(([X_array], [np.ones(data_len)]), axis = 0)
    f_mat = np.array([expd_vl, 1])
    v_vec = np.linalg.lstsq(e_mat, f_mat,rcond=None)[0]
    e_null = sla.null_space(e_mat)
    lamda_lower = np.linalg.pinv(e_null) @ (-1 * v_vec)
    lamda_upper = np.linalg.pinv(e_null) @ ( 1 - v_vec)
    lamda = lamda_lower + random.random()*(lamda_upper - lamda_lower)
    sol = v_vec + e_null @ lamda
    return sol

How can I generate p ? The sol has l1 norm 1, and |sol * x| = X. I was hoping interpolating b/w lamda_lower and lamda_upper would make it b/w 0 and 1 but it does not. I wanna write the function so that on every call it generates randomly one of the many possible solutions. I realise one of the obvious solutions would be to find an x_i < X and and x_j > X and interpolate b/w the 2, but I want the function to (in theory) be able to generate all the possible distributions.


Solution

  • After trying to create a recursive solution for hours I gave up and coded an iterative brute force ish approach myself.

    @numba.jit(nogil = True)  
    def normal_random_distribution(data_array, X_array, expd_vl):
        # print("here")
        EPSILON = 0.001
        data_len = data_array.size
        data_array[:] = 1 / data_len
        current_E = np.sum(data_array * X_array)
        middle = np.argmax(X_array > expd_vl) - 1
        while abs(current_E - expd_vl) > EPSILON:
            # print(abs(current_E - expd_vl) / expd_vl)
            left_i = random.randint( 0, middle )
            rght_i = random.randint( middle + 1, data_len - 1 )
            if current_E < expd_vl:
                up_lim = min(data_array[left_i] , ( expd_vl - current_E ) / ( X_array[rght_i] - X_array[left_i] ) )
                mu = up_lim / 2
                sigma = up_lim / 6
                x = np.clip(np.random.normal( mu, sigma, 1 ), 0, up_lim)[0]
                data_array[left_i] -= x
                data_array[rght_i] += x
                current_E += x * ( X_array[rght_i] - X_array[left_i] )
            else:
                up_lim = min(data_array[rght_i] , ( current_E - expd_vl ) / ( X_array[rght_i] - X_array[left_i] ) ) 
                mu = up_lim / 2
                sigma = up_lim / 6
                x = np.clip(np.random.normal( mu, sigma, 1 ), 0, up_lim)[0]
                data_array[left_i] += x
                data_array[rght_i] -= x
                current_E -= x * ( X_array[rght_i] - X_array[left_i] )