Search code examples
pythonrandomuniform-distribution

Deviation from expected value of mean distance between 2 points 'on ' a sphere


I was trying to verify the mean distance between 2 points in various 3-D and 2-D structures by taking average of multiple random cases. Almost all the time, I was getting a pretty good accuracy except for the case of points on the surface of a sphere. My code uses Gaussian distribution inspired from this answer (see the second most up voted answer)

Here is the python code:

import math as m
from random import uniform as u
sum = 0
for i in range(10000):
    x1 = u(-1, 1)
    y1 = u(-1, 1)
    x2 = u(-1, 1)
    y2 = u(-1, 1)
    z1 = u(-1, 1)
    z2 = u(-1, 1)
    if x1 == y1 == z1 == 0:
        sum += m.sqrt((x2) ** 2 + (y2) ** 2 + (z2) ** 2)
    elif x2 == y2 == z2 == 0:
        sum += m.sqrt((x1) ** 2 + (y1) ** 2 + (z1) ** 2)
    else:
        x1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
        y1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
        z1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
        x2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
        y2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
        z2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
        sum += m.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2 + (z1-z2) ** 2)
print(sum/10000)

The expected value is 4/3 which is shown here

Arguably the absolute difference is not very large. But the percentage deviation from expected value is around 1% on any run. On the other hand, in all other similar programs with other shapes and same number of random cases, the % deviation is around 0.05% on average.

Also, the value that the code returns is always less than 4/3. This is my major concern.

My guess is that I have implemented the algorithm in a wrong way. Any help is appreciated.

Edit:

After realizing the mistake made in the previous method, I now, first use rejection sampling to get the points lying inside the sphere. This will ensure that after dividing the point vectors with their norms, the resulting unit vector distribution will be uniform. In spite of doing that, I am getting a different result, which is unexpectedly more deviated from expected than the previous one. To be more precise, the limit approaches 1.25 with this algorithm.

Here is the code:

sum2 = 0
size = 0
for t in range(10000):  # Attempt 2
    x1 = u(-1, 1)
    y1 = u(-1, 1)
    x2 = u(-1, 1)
    y2 = u(-1, 1)
    z1 = u(-1, 1)
    z2 = u(-1, 1)
    if (x1**2 + y1**2 + z1**2)>1 or (x2**2 + y2**2 + z2**2)>1 or x1==y1==z1==0 or x2==y2==z2==0: continue
    size += 1
    x1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
    y1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
    z1 /= m.sqrt(x1 ** 2 + y1 ** 2 + z1 ** 2)
    x2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
    y2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
    z2 /= m.sqrt(x2 ** 2 + y2 ** 2 + z2 ** 2)
    sum2 += m.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2 + (z1 - z2) ** 2)
print(size)
print(sum2/size)

Solution

  • The initial random values for the two points are contained within a cube, rather than a sphere. After scaling each vector by 1/length, the vectors are on the unit sphere, but they are not evenly distributed across the surface of the sphere.

    You will tend to get more vectors near the corners of the cube, compared to the centre of each face. Since the vectors tend to cluster in regions, the average value of the distance between them is less than 4/3.

    This will do the trick: https://mathworld.wolfram.com/SpherePointPicking.html


    This code works for me:

    from math import sqrt
    from random import uniform
    
    
    sum2 = 0
    size = 0
    while size < 100000:
        x1 = uniform(-1, 1)
        y1 = uniform(-1, 1)
        x2 = uniform(-1, 1)
        y2 = uniform(-1, 1)
        z1 = uniform(-1, 1)
        z2 = uniform(-1, 1)
    
        r1 = sqrt(x1**2 + y1**2 + z1**2)
        r2 = sqrt(x2**2 + y2**2 + z2**2)
        if r1 > 1 or r2 > 1 or x1==y1==z1==0 or x2==y2==z2==0: continue
        size += 1
        x1 /= r1
        y1 /= r1
        z1 /= r1
        x2 /= r2
        y2 /= r2
        z2 /= r2
        sum2 += sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2 + (z1 - z2) ** 2)
    print(sum2/size)
    

    Output was:

    1.3337880809331075