Search code examples
matlabchartspoint

How can I cluster points which are connected in MATLAB?


Imagine we have many points which some of them are connected together and we want to cluster them.

Please take a look at the following figure.

enter image description here

If we have the "connectivity matrix" of points, how we can cluster them in two group (groups of connected points)?

ConnectivityMatrix=
                    [1 2
                     1 3
                     2 4
                     2 3
                     2 1
                     3 1
                     3 2
                     3 4
                     4 3
                     4 2
                     5 8
                     5 7
                     5 6
                     6 5
                     6 7
                     7 6
                     7 5
                     7 8
                     8 7
                     8 5]

The final result should be nodes of 1,2,3,4 in a first group(cluster) and nodes of 5,6,7,8 in a second group (cluster).


Solution

  • Here is some code to get you started. I basically implemented Depth First Search... a very crude implementation of it anyway.

    Depth First Search is an algorithm that is used for the traversal of trees. Graphs are essentially a special case of trees where there are leaf nodes that connect back to the root. The basic algorithm for Depth First Search is as so:

    • Start at the root of the tree and add this to a stack
    • For each node that is connected to the root, add this onto the stack and place this in a list of visited nodes
    • While there is still a node on the stack...
      1. Pop off a node off the stack
      2. Check the visited nodes list. If this is a node we have already visited, skip.
      3. Else, visit any nodes that are connected to this node we popped off and add to the stack

    If we have disconnected graphs like what you have above, we basically run Depth First Search multiple times. Each time will be for one cluster. After one Depth First Search result, we will discover nodes that belong to one cluster. We restart Depth First Search again with any node that we have not touched yet, which will be a node from another cluster that we have not visited. As you clearly have two clusters in your graph structure, we will have to run Depth First Search two times. This is commonly referred to as finding all connected components in an overall graph.

    To find the Connected Components, here are the steps that I did:

    1. Create a connectivity matrix
    2. Initialize a Boolean list that tells us whether or not we have visited a node in your graph
    3. Initialize an empty cluster list
    4. Initialize an empty stack that contains which nodes we should visit.
    5. While there is at least one node we need to visit...
      • Find such a node
      • Initialize our stack to contain this node
      • While our stack is not empty
        1. Pop off a node from the stack
        2. If we have visited this node, continue
        3. Else mark as visited
        4. Retrieve all nodes connected to this node
        5. Remove those nodes that are not on the stack in (4)
        6. Add these nodes to the stack and the cluster list
    6. Once the stack is empty, we have a list of all of the nodes contained in a single cluster. Add this cluster to a final list.
    7. Repeat 1 - 6 until we visit all nodes

    Without further ado, this is the code. Bear in mind this is not battle tested. If you have graph structures that will generate an error, that'll be on your own to fix :)

    ConnectivityMatrix = [1 2
                         1 3
                         2 4
                         2 3
                         2 1
                         3 1
                         3 2
                         3 4
                         4 3
                         4 2
                         5 8
                         5 7
                         5 6
                         6 5
                         6 7
                         7 6
                         7 5
                         7 8
                         8 7
                         8 5];
    
    %// Find all possible node IDs
    nodeIDs = unique(ConnectivityMatrix(:));
    
    %// Flag that tells us if there are any nodes we should visit
    nodeIDList = false(1,numel(nodeIDs));
    
    %// Stores our list of clusters
    clusterList = {};
    
    %// Keeps track of how many clusters we have
    counter = 1;
    
    %// Stack - initialize to empty
    stackNodes = [];
    
    %// While there is at least one node we need to visit
    while (~all(nodeIDList))    
        % Find any node
        stackNodes = find(nodeIDList == false, 1);
        % Initialize our stack to contain this node
        nodesCluster = stackNodes;
    
        %// While our stack is not empty
        while (~isempty(stackNodes))
            % Grab the node off the stack and pop off
            node = stackNodes(end);
            stackNodes(end) = [];        
    
            %// If we have marked this node as visited, skip
            if (nodeIDList(node))
                continue;
            end
    
            %// Mark as visited 
            nodeIDList(node) = true;
    
            %// Retrieve all nodes connected to this node
            connectedNodes = ConnectivityMatrix(ConnectivityMatrix(:,1) == node, :);
            nodesToVisit = unique(connectedNodes(:,2).');
    
            %// Remove those already visited
            visitedNodes = ~nodeIDList(nodesToVisit);
            finalNodesToVisit = nodesToVisit(visitedNodes);
    
            %// Add to cluster
            nodesCluster = unique([nodesCluster finalNodesToVisit]);
    
            %// Add to stack
            stackNodes = unique([stackNodes finalNodesToVisit]);                
        end
    
        %// Add connected components to its own cluster
        clusterList{counter} = nodesCluster;
        counter = counter + 1;
    end
    

    Once we have run this code, we can display our clusters like so:

    celldisp(clusterList)
    
    clusterList{1} =
    
     1     2     3     4
    
    
    clusterList{2} =
    
     5     6     7     8
    

    As such, cluster #1 contains nodes 1,2,3,4 while cluster #2 contains nodes 5,6,7,8.

    Bear in mind that this code will only work if you sequentially label your nodes as you did in your diagram. You can't skip any label numbers (i.e. you can't do 1,2,4,6,9, etc. This should be 1,2,3,4,5).

    Good luck!