Search code examples
algorithmmathcomputational-geometryeuclidean-distance

Generate P random N-dimensional points from list of ALL possible pairwise distances


I would like to generate random N-dimensional points with the constraint of having precise Euclidean distances (known) between each other.

Number of points P = 100 Number of dimensions of the hyperspace N = 512

Consequently, the possible number of pairwise distances is given by the formula L = P*(P-1)/2. If P = 100, then L = 4950.

Let's say I have a list of 4950 distance values, where each value refers to a precise point-point combination.

Is it possible to implement this using numpy?

It is trivial to do it when considering pairs of points (P = 2) as L = 1, but I'm trying to figure out if it can it be generalized to higher values of P?

This is my implementation for P = 2, considering set_dist as the desired distance value.

import numpy as np
from sklearn.metrics.pairwise import euclidean_distances

N = 512

set_dist = 5.

point_0 = np.random.rand(N).reshape(1, -1)
point_1 = np.random.rand(N).reshape(1, -1)
rand_dist = euclidean_distances(point_0, point_1)
point_0 = point_0 * set_dist / rand_dist
point_1 = point_1 * set_dist / rand_dist

Solution

  • Well, you asked to do numpy.

    import numpy as np
    

    I will be using d with d[i][j] being the distance from point i to point j. It needs to be symmetric, with 0 on the diagonal. I will leave it up to you to get from your distance list to that matrix.

    Our first task will be to find our points in some coordinate system, and then randomize it. So we'll focus on that.

    Suppose we have coordinates with only a few dimensions filled out. For example in 3 dimensions we know only the first coordinates. For all the points, we want to project them to the dimensions that we don't know, and find distances there. Here is a utility function to do that using the Pythagorean Theorem.

    def project_distances(d, coords, point, epsilon=10**-10):
        points = d.shape[0]
        point_coords = coords[point].reshape(1, coords.shape[1])
        known_d2 = np.sum(np.square(coords - point_coords), axis=1)
        project_d2 = np.square(d[point]) - known_d2
        # Fix potential roundoff errors.
        project_d2 = np.where(project_d2 < epsilon, 0, project_d2)
        return np.sqrt(project_d2)
    

    Now suppose we've done that projection. How do we find our next coordinate?

    Well we'll take point 0 arbitrarily and put it at the origin. And then for a second point we'll find the one that is farthest from it. Using Heron's formula we can find for every point all of the distances not explained by the new dimension. That's what is at right angles to our new dimension. We can use Pythagoras to find the absolute value along the new dimension. And then by comparing with the second point, we can figure out if it is a + or a -.

    def distances_to_normalized_coords (d, n, epsilon=10**-10):
        points = d.shape[0]
        # Start with everything 0.
        coords = np.zeros((points, n))
        project_d_from_0 = d[0]
        for dim in range(n):
            # Find the farthest away point to use to orient this dim.
            best_point = 0
            best_d = 0
            for point in range(1, points):
                if best_d < project_d_from_0[point]:
                    best_point = point
                    best_d = project_d_from_0[point]
            if best_d < epsilon:
                break # We're done.
    
            project_d_from_best = project_distances(d, coords, best_point, epsilon)
            # Use Heron's formula to find areas of triangles from 0 to best_point, to any point.
            s = (best_d + project_d_from_0 + project_d_from_best) / 2
            area2 = s * (s - best_d) * (s - project_d_from_0) * (s - project_d_from_best)
            area2 = np.where(area2 < epsilon, 0, area2)
            area = np.sqrt(area2)
            # area = base * height / 2. The base is best_d. Find the height.
            height = 2 * area / best_d
            # height is at right angles to this dimension. Find this dimension.
            distance2 = np.square(project_d_from_0) - np.square(height)
            distance2 = np.where(distance2 < epsilon, 0, distance2)
            distance = np.sqrt(distance2)
    
            # Actual coordinates can be +- distance. Which better eplains project_d_from_best?
            predict_best_d_plus = np.sqrt(np.square(height) + np.square(best_d - distance))
            predict_best_d_minus = np.sqrt(np.square(height) + np.square(best_d + distance))
            column = np.where(np.abs(predict_best_d_plus - project_d_from_best) <= np.abs(predict_best_d_minus - project_d_from_best), distance, -distance)
    
            # Copy the column to the coordinates.
            for point in range(1, points):
                coords[point][dim] = column[point]
    
            # And reproject with current coords.
            project_d_from_0 = project_distances(d, coords, 0, epsilon)
    
        return coords
    

    OK, now we know a single set of points with the right distances. Let's now move it around randomly.

    from scipy.stats import ortho_group
    def randomly_position(coords, deviation):
        points, n = coords.shape
        center = np.sum(coords, axis=0).reshape(1, n) / points
        rotate = ortho_group(coords.shape[1]).rvs()
        return np.matmul(coords - center, rotate) + deviation * np.random.randn(1, n)
    

    And as an example, let's do a cube.

    def cube_corners ():
        for i in (0, 1):
            for j in (0, 1):
                for k in (0, 1):
                    yield (i, j, k)
    
    def cube_distances():
        distances = []
        for i1, j1, k1 in cube_corners():
            row = []
            for i2, j2, k2 in cube_corners():
                row.append(((i1-i2)**2 + (j1-j2)**2 + (k1-k2)**2)**0.5)
            distances.append(row)
        return np.array(distances)
    
    distances = cube_distances()
    coords = distances_to_normalized_coords(distances, 3)
    print(randomly_position(coords, 5))
    

    And if you want to actually check that the answer makes sense.

    for i in range(len(distances)):
        for j in range(len(distances[i])):
            print(i, j, distances[i][j], (np.sum(np.square(coords[i] - coords[j])))**0.5)
    

    For debugging purposes, here is the whole program and a little test code.

    import numpy as np
    
    def project_distances(d, coords, point, epsilon=10**-10):
        points = d.shape[0]
        point_coords = coords[point].reshape(1, coords.shape[1])
        known_d2 = np.sum(np.square(coords - point_coords), axis=1)
        project_d2 = np.square(d[point]) - known_d2
        # Fix potential roundoff errors.
        project_d2 = np.where(project_d2 < epsilon, 0, project_d2)
        return np.sqrt(project_d2)
    
    def distances_to_normalized_coords (d, n, epsilon=10**-10):
        points = d.shape[0]
        # Start with everything 0.
        coords = np.zeros((points, n))
        project_d_from_0 = d[0]
        for dim in range(n):
            # Find the farthest away point to use to orient this dim.
            best_point = 0
            best_d = 0
            for point in range(1, points):
                if best_d < project_d_from_0[point]:
                    best_point = point
                    best_d = project_d_from_0[point]
            if best_d < epsilon:
                break # We're done.
    
            project_d_from_best = project_distances(d, coords, best_point, epsilon)
            # Use Heron's formula to find areas of triangles from 0 to best_point, to any point.
            s = (best_d + project_d_from_0 + project_d_from_best) / 2
            area2 = s * (s - best_d) * (s - project_d_from_0) * (s - project_d_from_best)
            area2 = np.where(area2 < epsilon, 0, area2)
            area = np.sqrt(area2)
            # area = base * height / 2. The base is best_d. Find the height.
            height = 2 * area / best_d
            # height is at right angles to this dimension. Find this dimension.
            distance2 = np.square(project_d_from_0) - np.square(height)
            distance2 = np.where(distance2 < epsilon, 0, distance2)
            distance = np.sqrt(distance2)
    
            # Actual coordinates can be +- distance. Which better eplains project_d_from_best?
            predict_best_d_plus = np.sqrt(np.square(height) + np.square(best_d - distance))
            predict_best_d_minus = np.sqrt(np.square(height) + np.square(best_d + distance))
            column = np.where(np.abs(predict_best_d_plus - project_d_from_best) <= np.abs(predict_best_d_minus - project_d_from_best), distance, -distance)
    
            # Copy the column to the coordinates.
            for point in range(1, points):
                coords[point][dim] = column[point]
    
            # And reproject with current coords.
            project_d_from_0 = project_distances(d, coords, 0, epsilon)
    
        return coords
    
    from scipy.stats import ortho_group
    def randomly_position(coords, deviation):
        points, n = coords.shape
        center = np.sum(coords, axis=0).reshape(1, n) / points
        rotate = ortho_group(coords.shape[1]).rvs()
        return np.matmul(coords - center, rotate) + deviation * np.random.randn(1, n)
    
    def random_points (count, dims):
        return np.random.randn(count, dims)
    
    def points_to_distances (points):
        return np.array([
            [(np.sum(np.square(points[i] - points[j])))**0.5 for j in range(len(points))]
            for i in range(len(points))])
    
    for _ in range(100):
        points = random_points(8, 3)
        distances = points_to_distances(points)
        coords = distances_to_normalized_coords(distances, 3)
        coords = randomly_position(coords, 1)
        for i in range(len(distances)):
            for j in range(len(distances[i])):
                coord_distance = (np.sum(np.square(coords[i] - coords[j])))**0.5
                if 0.00000001 < abs(coord_distance - distances[i][j]):
                    print("points", points)
                    print("distances", distances)
                    print("coords", coords)
                    print(i, j, distances[i][j], (np.sum(np.square(coords[i] - coords[j])))**0.5)