python numpy statistics simulation distribution

Simulating expectation of continuous random variable

Currently I want to generate some samples to get expectation & variance of it.

Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}

I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable

But here is what I have when simulating using python:

import numpy as np
N = 100_000
X = np.random.uniform(size=N, low=0, high=1)
Y = [2*x for x in X]
np.mean(Y) # 1.00221 <- not equal to 2/3
np.var(Y) # 0.3323 <- not equal to 1/18

What am I doing wrong here? Thank you in advanced.

Solution

To approximate the integral of some function of x, say, g(x), over S = [0, 1], using Monte Carlo simulation, you

generate N random numbers in [0, 1] (i.e. draw from the uniform distribution U[0, 1])
calculate the arithmetic mean of g(x_i) over i = 1 to i = N where x_i is the ith random number: i.e. (1 / N) times the sum from i = 1 to i = N of g(x_i).

The result of step 2 is the approximation of the integral.

The expected value of continuous random variable X with pdf f(x) and set of possible values S is the integral of x * f(x) over S. The variance of X is the expected value of X-squared minus the square of the expected value of X.

Expected value: to approximate the integral of x * f(x) over S = [0, 1] (i.e. the expected value of X), set g(x) = x * f(x) and apply the method outlined above.
Variance: to approximate the integral of (x * x) * f(x) over S = [0, 1] (i.e. the expected value of X-squared), set g(x) = (x * x) * f(x) and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value of X to obtain an estimate of the variance of X.

Adapting your method:

import numpy as np
N = 100_000
X = np.random.uniform(size = N, low = 0, high = 1)

Y = [x * (2 * x) for x in X]
E = [(x * x) * (2 * x) for x in X]

# mean
print((a := np.mean(Y)))
# variance 
print(np.mean(E) - a * a)

Output

0.6662016482614397
0.05554821798023696

Instead of making Y and E lists, a much better approach is

Y = X * (2 * X)
E = (X * X) * (2 * X)

Y, E in this case are numpy arrays. This approach is much more efficient. Try making N = 100_000_000 and compare the execution times of both methods. The second should be much faster.