Search code examples
pythonmatrixcluster-analysisshapes

Assign 2D clustered coordinates to shapes


I consider the case of a 2D (10x10) matrix, currently filled with zeros (in a sense, a non-active grid). At some point, some matrix elements will become active (so, a value of 1 will be attributed).

The way this happens is by passing a list of (X,Y) coordinates on the matrix that will be active. The order or which elements will come first is unknown. The areas that become active on the matrix can be as big as just one element in the matrix (1x1) or some parts of the grid (clusters) become active in a specific pattern, as in this example:

Clusters of information on a matrix

For start, I am able to bunch together neighboring elements that are active and get some information about that "cluster": the number of active elements in a cluster, as well as the row and column width. For example, the top right cluster has 3 elements active, a row width of 2 and a column width of 2.

My aim is to match these clusters to pre-defined shapes, identified by their ID:

Shapes

Having the number of active elements for each of the clusters, a first coarse division into categories can be done:

  • If one element is active in a cluster -> Shape ID 0
  • If two elements are active in a cluster -> Shape ID 1 or 2
  • If three elements are active in a cluster -> Shape ID 3 - 8
  • If four elements are active in a cluster -> Shape ID 9 - 27

Making use of the second type of information (row, column width of a cluster of active elements), each category can be split up again. Taking the category with three elements active in a cluster:

  • If the column width is 3 and row width is 1 -> Shape ID 5
  • If the column width is 2 and row width is 2 -> Shape ID 3, 4, 6 or 7
  • If the column width is 1 and row width is 3 -> Shape ID 8

The same can be done for the clusters of four.

Following this logic, I can solve my problem and assign some clusters to their correct shape. My problem arises now with shapes where the information (about the number of elements active, their row and column width) is insufficient.

An example are the top right and lower right clusters of Fig. 1. Here, both have 3 elements active, column and row widths = 2.

How can I divide this yet again and assign the correct shapes?


Solution

  • Maybe try "standardizing" your cluster coordinates - that is, given a list C = [(i1,j1)...(ik,jk)] of coordinates in a cluster (a representation of the cluster), let (i*,j*) be the lexicographically first point in the list (i.e. bottom left-most point). Subtract i* and j* from all of the cluster coordinates, sort the coordinates, and then hash the sorted coordinates. You are guarenteed that unique cluster shapes have unique hashes, and clusters with the same shape now have different hashes. This is done without assigning specific cluster ID's, but they can easily be recovered by looking at the key set of your hash table. Hope this helps (you can easily hash all of the "shapes" you defined to assign your cluster labels before/after and store those in a separate hash table to get the id)

    import numpy as np
    
    #assume we have clusters of the form
    #C = [[i1,j1],[i2,j2],...,[ik,jk]]
    #let "Clusters" be the set of all such Cs
    #let "Shapes" be a set of tuples containing the
    #coordinates of all the shapes you defined, 
    #plus the corresponding id
    #i.e. Shapes = [[shape1,ID1],[shape2,ID2]...]
    
    def gen_hash(cluster):
        sorted = np.lexsort(cluster)
        smallest = sorted[0]
        sorted[:,0]-=smallest[0]
        sorted[:,1]-=smallest[1]
        return str(sorted)
    
    shape_ids = {gen_hash(shape):ID for (shape,ID) in Shapes}
    
    for c in Clusters:
        c_hash = gen_hash(c)
        c_id = shape_ids[c_hash]
        print('this cluster has ID :',c_id)