Search code examples
pythonarraysalgorithmpython-2.7math

Sort the coordinates of points by areas that they occupy (Python)


Suppose we have a particular number of points distributed in the plane at the nodes of a grid having a given step. These points form several continuous regions.

We can tell that points (if there is more than one point) form a continuous region if for each point in region there is another point at a distance less than or equal to step*sqrt(2).

It is necessary to have only a single list (or dictionary) of coordinates and grid step, to distribute the coordinates to new arrays. Each new array should contain coordinates that belong to a particular region.

Below is a picture for clarity. In this case, we have three continuous regions.

Image

Now I have the following implementation.

from itertools import groupby


step = 1
regions = [] # list of lists with points from particular regions
while all_equal(list(points.values())) == False: # while not all points have been explored ("points" is a single dict with coordinates, where keys are numbers of points and values are tuples with coordinates)
    region = [] # create a list for new region
    region.append(points[list(points.keys())[0]]) # add to the new list the first value from "points"
    points[list(points.keys())[0]] = None # "None" means that this point has already been explored
    for point in points:
        for region_point in region:
            if points[point] != None: # if this point has not already been explored
                if distance(points[point], region_point) <= step*2**0.5: # if these points form a continuous segment
                    region.append(points[point])
                    points[point] = None
                    break
    regions.append(region)
    points = {k:v for k,v in points.items() if v != None} # remove all explored points from a single dict

for element in regions:
    print element


def all_equal(array):
    g = groupby(array)
    return next(g, True) and not next(g, False)

But this code works incorrectly. Moreover, it is organized in a very suboptimal way. But now I have no idea how it could be realized differently.

An example of input data:

    points = {
        1: (1, 10),
        2: (1, 8),
        3: (2, 9),
        4: (2, 6),
        5: (3, 8),
        6: (3, 7),
        7: (3, 6),
        8: (6, 7),
        9: (7, 8),
        10: (7, 7),
        11: (8, 7),
        12: (8, 6),
        13: (10, 5),
        14: (11, 6),
        15: (12, 6),
        16: (12, 5),
        17: (13, 6),
        18: (13, 5)
    }

Expected result:

[(1, 10), (1, 8), (2, 9), (2, 6), (3, 8), (3, 7), (3, 6)]
[(6, 7), (7, 8), (7, 7), (8, 7), (8, 6)]
[(10, 5), (11, 6), (12, 6), (12, 5), (13, 6), (13, 5)]

Current result:

[(1, 10), (2, 9), (3, 8), (3, 7), (3, 6)]
[(1, 8)]
[(2, 6)]
[(6, 7), (7, 8), (7, 7), (8, 7), (8, 6)]
[(10, 5), (11, 6), (12, 6), (12, 5), (13, 6), (13, 5)]

Please use Python 2 to solve this problem (or Python 3 without libraries that are missing in Python 2).


Solution

  • This does the trick (tried on Python 2.7.18, works fine, and just so happens to be artisanal code that works on Python 3.11 too).

    The idea is that we do a sort of "flood-fill" or walk across the cells.

    The output is

    [(1, 10), (1, 8), (2, 9), (2, 6), (3, 8), (3, 7), (3, 6)]
    [(6, 7), (7, 8), (7, 7), (8, 7), (8, 6)]
    [(10, 5), (11, 6), (12, 6), (12, 5), (13, 6), (13, 5)]
    

    as expected.

    def get_neighbors(point_id_to_coords, open_point_ids, point_id):
        x, y = point_id_to_coords[point_id]
        for neighbor in open_point_ids:
            nx, ny = point_id_to_coords[neighbor]
            if abs(x - nx) <= 1 and abs(y - ny) <= 1:
                yield neighbor
    
    
    def cluster_points(point_id_to_coords):
        # Create a set of all point IDs that haven't been visited yet
        open_point_ids = set(point_id_to_coords)
    
        # While there are unclustered points, choose a random point
        while open_point_ids:
            start = open_point_ids.pop()
    
            # Initialize a new cluster, and its open points with only the start point
            cluster = {start}
            cluster_open = {start}
    
            # While there are still points to be examined for this cluster, choose a random point
            while cluster_open:
                point = cluster_open.pop()
                neighbor_ids = set(get_neighbors(point_id_to_coords, open_point_ids, point))
                # For each neighboring point, add it to the cluster and remove it from the unvisited sets
                for neighbor in neighbor_ids:
                    cluster.add(neighbor)
                    cluster_open.add(neighbor)
                    open_point_ids.remove(neighbor)
    
            yield cluster
    
    
    points = {
        1: (1, 10),
        2: (1, 8),
        3: (2, 9),
        4: (2, 6),
        5: (3, 8),
        6: (3, 7),
        7: (3, 6),
        8: (6, 7),
        9: (7, 8),
        10: (7, 7),
        11: (8, 7),
        12: (8, 6),
        13: (10, 5),
        14: (11, 6),
        15: (12, 6),
        16: (12, 5),
        17: (13, 6),
        18: (13, 5),
    }
    
    for cluster in cluster_points(points):
        print([points[pid] for pid in cluster])