Search code examples
neo4jcluster-analysisk-meanssdn

Network graph clustering


I have a large-scale network consisting of 62,578 nodes. It represents the network topology of an SDN network (Software Defined Networking).

  • I want to partition the graph into a number of clusters, each cluster should be controlled by an SDN controller.

  • I tried to use the k-means algorithm, but it doesn't take into account the relationships between the nodes. This algorithm relies on the nodes and their properties.

  • Then I tried the similarity algorithm, but it calculates the similarity score between 2 nodes and creates a new relationship that holds this value between these 2 nodes. As a result, I couldn't benefit from this way in k-means algorithm.

  • Louvain and Leiden don't allow the number of clusters in advance. Is there any way to do that with these algorithms?

Suggestions Please. Any idea may be of great help to me. Many thanks.

Update: This photo is part of the whole graph.

enter image description here

Update 2:

CALL gds.graph.project  
(
    'myGraph',
    'node',
    'to',
    {
        relationshipProperties: 'cost'
    }
)

CALL gds.fastRP.write
(
  'myGraph',
  {
    embeddingDimension: 1,
    writeProperty: 'fastrp-embedding'
  }
)
YIELD nodePropertiesWritten  

 CALL gds.graph.project
    (
        'myGraph_1',
        {
          node: {
            properties: 'fastrp-embedding'
          }
        },
        '*'
    )

CALL gds.alpha.kmeans.write 
('myGraph_1', 
{
  nodeProperty: 'fastrp-embedding',
  k: 3,
  randomSeed: 42,
  writeProperty: 'kmeans'
})
YIELD nodePropertiesWritten

Update 3:

I applied FastRP to create node embeddings for a graph consisting of 6,301 nodes. These embeddings are stored as node properties to be used in clustering using the K-means algorithm. I noticed that although the nodes are nearby to each other, they are assigned to different clusters.

Notes:

  • For FastRP, I set the embedding dimensions to 256.

  • For K-means, I set K to 2.

  • I tried a smaller embedding equals 2, 4, etc, The same results occurred.

  • In addition, I tried another graph size with 8,846 nodes. Similar incomprehensible results occurred.

  • I didn't specify a random seed for FastRP. I didn't know how to set a preferred value for this parameter. I it related to the graph size like the node embedding?

For the sub-graph below, the following are the results of clustering:

enter image description here enter image description here


Solution

  • If you want to define the number of clusters, you can try the following.

    First create node embeddings using any of the available models like the FastRP or node2vec. Next, use the k-means algorithm to cluster nodes based on the embeddings.

    Use the mutate mode for the FastRP and store the results under lets say embedding property. Next, run k-means algorithm on the embeddings like in the example:

    CALL gds.alpha.kmeans.write('cities', {
      nodeProperty: 'embedding',
      k: 3,
      writeProperty:'cluster'
    })
    

    where the k parameter defines the number of clusters and the nodeProperty should point to the embedding property.