Search code examples
pythonnumpystatisticssimulationdistribution

Simulating expectation of continuous random variable


Currently I want to generate some samples to get expectation & variance of it.

Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}

I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable

But here is what I have when simulating using python:

import numpy as np
N = 100_000
X = np.random.uniform(size=N, low=0, high=1)
Y = [2*x for x in X]
np.mean(Y) # 1.00221 <- not equal to 2/3
np.var(Y) # 0.3323 <- not equal to 1/18

What am I doing wrong here? Thank you in advanced.


Solution

  • To approximate the integral of some function of x, say, g(x), over S = [0, 1], using Monte Carlo simulation, you

    • generate N random numbers in [0, 1] (i.e. draw from the uniform distribution U[0, 1])

    • calculate the arithmetic mean of g(x_i) over i = 1 to i = N where x_i is the ith random number: i.e. (1 / N) times the sum from i = 1 to i = N of g(x_i).

    The result of step 2 is the approximation of the integral.

    The expected value of continuous random variable X with pdf f(x) and set of possible values S is the integral of x * f(x) over S. The variance of X is the expected value of X-squared minus the square of the expected value of X.

    • Expected value: to approximate the integral of x * f(x) over S = [0, 1] (i.e. the expected value of X), set g(x) = x * f(x) and apply the method outlined above.
    • Variance: to approximate the integral of (x * x) * f(x) over S = [0, 1] (i.e. the expected value of X-squared), set g(x) = (x * x) * f(x) and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value of X to obtain an estimate of the variance of X.

    Adapting your method:

    import numpy as np
    N = 100_000
    X = np.random.uniform(size = N, low = 0, high = 1)
    
    Y = [x * (2 * x) for x in X]
    E = [(x * x) * (2 * x) for x in X]
    
    # mean
    print((a := np.mean(Y)))
    # variance 
    print(np.mean(E) - a * a) 
    

    Output

    0.6662016482614397
    0.05554821798023696
    

    Instead of making Y and E lists, a much better approach is

    Y = X * (2 * X)
    E = (X * X) * (2 * X)
    

    Y, E in this case are numpy arrays. This approach is much more efficient. Try making N = 100_000_000 and compare the execution times of both methods. The second should be much faster.