I've developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.
My current implementation in Python is as follows:
Input data is a sparse graph:
Edges are roads
Nodes are vehicle destinations and road intersections
Create Minimum Spanning Tree using Prims Algorithm
For every edge in the MST:
Perform depth-first-search on the the two subgraphs on each side of the edge:
Sum up road lengths for each edge
If total road length for one of the subgraphs is within a defined range, then remove the edge
Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?
Late update, but solution ultimately involved using Kruskal's algorithm instead of Primm's algorithm.
Rewriting my existing process in C++ dropped processing speed from 2 days to 10 hours.
Using Kruskals algorithm in Python dropped processing speed from 2 days to 1 hour.