Currently I want to generate some samples to get expectation & variance of it.
Given the probability density function: f(x) = {2x, 0 <= x <= 1; 0 otherwise}
I already found that E(X) = 2/3, Var(X) = 1/18, my detail solution is from here https://math.stackexchange.com/questions/4430163/simulating-expectation-of-continuous-random-variable
But here is what I have when simulating using python:
import numpy as np
N = 100_000
X = np.random.uniform(size=N, low=0, high=1)
Y = [2*x for x in X]
np.mean(Y) # 1.00221 <- not equal to 2/3
np.var(Y) # 0.3323 <- not equal to 1/18
What am I doing wrong here? Thank you in advanced.
To approximate the integral of some function of x
, say, g(x)
, over S = [0, 1]
, using Monte Carlo simulation, you
generate N
random numbers in [0, 1]
(i.e. draw from the uniform distribution U[0, 1]
)
calculate the arithmetic mean of g(x_i)
over i = 1
to i = N
where x_i
is the i
th random number: i.e. (1 / N)
times the sum from i = 1
to i = N
of g(x_i)
.
The result of step 2 is the approximation of the integral.
The expected value of continuous random variable X
with pdf f(x)
and set of possible values S
is the integral of x * f(x)
over S
. The variance of X
is the expected value of X
-squared minus the square of the expected value of X
.
x * f(x)
over S = [0, 1]
(i.e. the expected value of X
), set g(x) = x * f(x)
and apply the method outlined above.(x * x) * f(x)
over S = [0, 1]
(i.e. the expected value of X
-squared), set g(x) = (x * x) * f(x)
and apply the method outlined above. Subtract the result of this by the square of the estimate of the expected value of X
to obtain an estimate of the variance of X
.Adapting your method:
import numpy as np
N = 100_000
X = np.random.uniform(size = N, low = 0, high = 1)
Y = [x * (2 * x) for x in X]
E = [(x * x) * (2 * x) for x in X]
# mean
print((a := np.mean(Y)))
# variance
print(np.mean(E) - a * a)
Output
0.6662016482614397
0.05554821798023696
Instead of making Y
and E
lists, a much better approach is
Y = X * (2 * X)
E = (X * X) * (2 * X)
Y
, E
in this case are numpy
arrays. This approach is much more efficient. Try making N = 100_000_000
and compare the execution times of both methods. The second should be much faster.