python numpy simulation probability-distribution

Rejection sampling not giving the correct distribution

I'm writing code to perform rejection sampling from a Cauchy distribution.

x = np.linspace(-4, 4, 100)

dist = scipy.stats.cauchy
global_max = dist.pdf(0)

This is straightforward. The default parameters of the Cauchy distribution in scipy.stats are (0, 1). In the code below, I generate a million random points, and only accept the points whose y-coordinate lies below the curve, i.e. less than the Cauchy PDF value at the corresponding x-coordinate.

num_samples=1000000

sample_y = np.random.uniform(0, global_max, size=num_samples)
sample_x = np.random.uniform(-4, 4, size=num_samples)

X = sample_x[sample_y <= dist.pdf(sample_x)]
params = scipy.stats.cauchy.fit(X)

Finally I compare the actual distribution to the estimated one:

print('Estimated parameters =', params)

plt.hist(X, bins=50, density=True, alpha=0.3)
plt.plot( x, scipy.stats.cauchy( params[0], params[1] ).pdf(x), color='red' )
plt.plot( x, dist.pdf(x), color='green' );

Output:

Actual parameters    = (0, 1)
Estimated parameters = (-0.0030743926369336217, 0.7362620502669363)

I'm unable to understand why this is happening. The variance is significantly different. What am I missing here?

Solution

Cauchy distribution is known for "fat tails", but you limit your samples to [-4, +4] range. It is too narrow.

Try a bigger range. E.g.

num_samples=100000000
max_x = 2000

sample_y = np.random.uniform(0, global_max, size=num_samples)
sample_x = np.random.uniform(-max_x, max_x, size=num_samples)

The fitted params are (-0.008691511218505928, 0.9918572787654598)

X = sample_x[sample_y <= dist.pdf(sample_x)]

params = sps.cauchy.fit(X)
_X = X[np.logical_and(X < 4, X>-4)]

plt.hist(_X, bins=50, density=True, alpha=0.3)
plt.plot( x, sps.cauchy( params[0], params[1] ).pdf(x), color='red' )
plt.plot( x, dist.pdf(x), color='green' );

If you don't like how it looks, compare with the reference Cauchy-distributed PRNG.

ref_x = np.random.standard_cauchy(size=int(num_samples/max_x))
plt.hist(ref_x[np.logical_and(ref_x < 4, ref_x>-4)], bins=50, density=True, alpha=0.3)
plt.plot( x, dist.pdf(x), color='green' );

The pics look very much alike, aren't they?