I will like to use a clustering algorithm to find a clustering for a big Digraph, and I will like remove noise from this graph too. So, I was thinking to use the DBSCAN approach, because I saw that we can give to the algorithm a distance function for determining the distance/similarity between two different nodes.
My question is, how can I define a distance function which increases the similarity between two nodes closes in terms of hops and decrease when a node is isolated.
I don't have coordinates or node attributes, so I can not use those. I only have the topology of the graph.
The expected output will be something like this:
I'm really concern about the complexity of the solution. How can approximate a clustering with a linear complexity ...
What is wrong with the obvious?
Distance(a,b) = length of shortest path, or infinity if there is none.
You probably should take directions into account, so a0 to a3 ist 1.