Search code examples
machine-learningcluster-analysisunsupervised-learning

Distance function for DBSCAN


I will like to use a clustering algorithm to find a clustering for a big Digraph, and I will like remove noise from this graph too. So, I was thinking to use the DBSCAN approach, because I saw that we can give to the algorithm a distance function for determining the distance/similarity between two different nodes.

My question is, how can I define a distance function which increases the similarity between two nodes closes in terms of hops and decrease when a node is isolated.

I don't have coordinates or node attributes, so I can not use those. I only have the topology of the graph.

The expected output will be something like this:

enter image description here

I'm really concern about the complexity of the solution. How can approximate a clustering with a linear complexity ...


Solution

  • What is wrong with the obvious?

    Distance(a,b) = length of shortest path, or infinity if there is none.

    You probably should take directions into account, so a0 to a3 ist 1.