Search code examples
pythonalgorithmsearchgraph-theoryminimum-spanning-tree

Fastest Algorithm/Software for MST and Searching on Large Sparse Graphs


I've developed a graph clustering model for assigning destinations to vehicle routes, but my implementation is very slow. It takes about two days to process a graph with 400k nodes.

My current implementation in Python is as follows:

Input data is a sparse graph:
       Edges are roads
       Nodes are vehicle destinations and road intersections

Create Minimum Spanning Tree using Prims Algorithm

For every edge in the MST:
      Perform depth-first-search on the the two subgraphs on each side of the edge:
              Sum up road lengths for each edge
      If total road length for one of the subgraphs is within a defined range, then remove the edge

Any recommendations to make this implementation faster? Could using Networkx or Neo4J speed this up?


Solution

  • Late update, but solution ultimately involved using Kruskal's algorithm instead of Primm's algorithm.

    Rewriting my existing process in C++ dropped processing speed from 2 days to 10 hours.

    Using Kruskals algorithm in Python dropped processing speed from 2 days to 1 hour.