Search code examples
pythongraph-theoryhierarchical-clustering

Finding connectedness from a list of nodes and a list of edges


(tl;dr)

Given a collection of nodes defined as a dictionary of points, and a collection of edges defined as a dictionary of key tuples, is there an algorithm in python to easily find consecutive segments?

(context:)

I have two files that model segments of a road network.

Nodes.txt:

1;-5226574;-3118329 # latitude and longitude as integers (multiplied by 1e5)
2;-5226702;-3118330
3;-5226750;-3118332
4;-5226793;-3118338
...

Edges.txt:

1;1;2
2;3;5
3;23;345
4;23;11
...

Each edge represents an (indexed) link between two nodes, by the node indexes.

A sub-section of the generated network is plotted below:

enter image description here

As you can see, the overwhelming majority of nodes is a "simple" node, meaning it is in the middle of a road segment and belongs to exactely two edges. On the other hand, there are "special" nodes, meaning they represent a bifurcation, or crossroad, because it belongs to three or more edges.

Currently, I have a collection of isolated road segments, but I would like to have each road segment between two special nodes defined as a sequence of nodes. It makes everything much faster to plot, to measure distances, etc., and it also allows me to represent each node sequence as a "super edge" linking two special nodes, thus simplifying the topology.

I can easily imagine some brute-force way to do that, but the amount of nodes is relatively high, and I don't have a theoretical background that points me a way to solve this.

UPDATE:

I have created a gist with my raw data. Each line represents a road as a sequence of points (lat, lon), and the roads overlap a lot. My goal is to generate the dictionaries for nodes and links from this "list of roads" in the file.

You can use the following python script to access the content:

with open('RawRoads.txt') as roadsFile:
    for line in roadsFile.readlines():
        road = [tuple(map(lambda x:int(float(x)*1e5), coord.split(','))) for coord in line.strip().split(' ')]

or else:

import urllib

url = "https://gist.githubusercontent.com/heltonbiker/ca043f8ee191db5bf8349b1b7af0394c/raw/RawRoads.txt"

lines = urllib.urlopen(url).readlines() 
for line in lines:
    # you got the idea

Solution

  • Let's not get too brutish. I think we can do well by building a simple list of lists, such that edge[i] is a list of up to three elements, the nodes to which node i is connected. If your node numbers are dense and start near 0, you can use a list; in case they're not, I'll use a directory.

    I build a list from edges.txt in the form of

    edge_list = [(1,2), (2,3), (3,5), (2, 23), (23,345), (23,11), ...]

    Now build the two-way edge reference directory:

    Next, identify the special nodes, those with an order other than 2: intersections and map edges. Then we pick one and build a segment until we hit another.

    # Dictionary of edges, indexed in both directions by node number.
    edge = {}
    
    # Ingest the data and build teh dictionary
    with open("edges.txt") as efile:
        for line in efile:
            eid, src, dst = line.strip().split(';')
            src = int(src)
            dst = int(dst)
    
            for key, val in [(src, dst), (dst, src)] :
                if key in edge:
                    edge[key].append(val)
                else:
                    edge[key] = ([val])
    print "edge dictionary has entries:", len(edge)
    
    # Identify endpoint nodes: order other than 2
    end_ct = 0
    print "Endpoint Nodes"
    endpoint = []
    for src, dst in edge.iteritems():
        if len(dst) != 2:
            print len(dst), src, dst
            endpoint.append(src)
            end_ct += len(dst)
    print end_ct, "road ends"
    
    atlas = []    # List of roads, each a list of nodes
    
    # Build roads between the identified endpoints
    # Pick the first endpoint in the remaining list.
    # Move to the first-listed adjacent node.
    # Keep going until we hit another node on the endpoint list.
    while len(endpoint) > 0:
        here = endpoint[0]
    #   print "Road starting at", here, edge[here]
    
        # Pick a first step and consume the edge
        next = edge[here].pop(0)
        edge[next].remove(here)
        road = [here, next]
    
        # If that's the last connection to the node, remove that node from the endpoints list.
        if len(edge[here]) == 0:
            del endpoint[0]
            del edge[here]
        # Special case for a one-segment road; endpoint entry of "next" is removed after the loop
        if len(edge[next]) == 0:
            del edge[next]
    
        # Consume edges until we reach another endpoint.
        debug = False
        while next not in endpoint:
            here = next
            next = edge[here].pop(0)
            edge[next].remove(here)
            road.append(next)
            if len(edge[next]) == 0:
                del edge[next]
    #           print "removing node", next
    
        if next not in edge:
            endpoint.remove(next)
    #       print "removing endpoint", next
    
        print "\nRoad from", road[0], "to", road[-1], ':\n\t', road
        atlas.append(road)
    
    print "\n", len(atlas), "roads built"
    # print "edge dictionary still has entries:", len(edge)
    

    EDIT FROM OP:

    It worked, it's fast and correct, and I find it deserves a visualization:

    import matplotlib.pyplot as plt
    
    for road in atlas:
        path = [nodesdict[i] for i in road]
        lons, lats = zip(*path)
        plt.plot(lons, lats)
    
    plt.grid()
    plt.axis('equal')
    plt.show()