Search code examples
postgresqlopenstreetmappostgistopologypgrouting

search recursively for dead-ends in topological network table


I've been trying for weeks to figure this out: I need to recursively search a topological network, OpenStreetMap streets in this case, for dead ends, and neighborhoods that hang from the rest of the network by only one edge. These are places where you might expect to see a no-exit sign if your city is considerate like that.

My table has a record for each edge in the network. Each edge has a 'target' and 'source' field, identifying the node to which that side of the edge is connected. I've added a binary column called 'dangling' to indicate whether the edge has been identified as a dea-ending segment. I initialize this column as FALSE, assuming the best.

So far, I've been able to get to identify simply branching dead-ends with the following SQL

WITH node_counts AS ( -- get all unique nodes
SELECT target AS node FROM edge_table WHERE NOT dangling
UNION ALL
SELECT source AS node FROM edge_table WHERE NOT dangling),
single_nodes AS ( -- select only those that occur once
SELECT node 
FROM node_counts
GROUP BY node
HAVING count(*) = 1
) -- 
UPDATE edge_table SET dangling = true 
FROM single_nodes
WHERE node = target OR node = source;

I simply keep running this query until no rows are updated. The result looks like this(red is dangling = true):

https://i.sstatic.net/OE1rZ.png

Excellent! This is working great...but there are still cul-de-sac neighborhoods if you will, which are only connected to the larger network by one edge. How can I identify those?

My best guess is that I'm going to need a WITH RECURSIVE at some point, but that's about as far as my unmathmatical mind will go. Can anyone point me in the right direction?


Solution

  • OK. Here's how I figured it out:

    I decided that there was not a way, or least not an easy way to implement this in SQL alone. I ended up implementing Tarjan's Algorithm in PHP and SQL, creating a temporary nodes table which linked each node to a strongly connected subcomponent of the graph. Once that was done, I updated any segment that was touching a node which did not belong to the largest subcomponent, as 'dangling'. All edges therefor that started and ended at nodes belonging to the largest subcomponent belong to the main street network (not dangling).

    Here's the code. Note that it can take a very long time to run on a large graph. It's also pretty hard on the working memory, but it worked for my purposes.

    <?php
    $username = '';
    $password = '';
    $database = '';
    
    $edge_table = 'cincy_segments';
    $v1 = 'target';
    $v2 = 'source';
    
    $dangling_boolean_field = 'dangling';
    $edge_id_field = 'edge_id';
    
    //global variables declared
    $index = 0;
    $component_index = 0;
    $nodes = array();
    $stack = array();
    
    pg_connect("host=localhost dbname=$database user=$username password=$password");
    
    // get vertices
    echo "getting data from database\n";
    $neighbors_query = pg_query("
    WITH nodes AS (
        SELECT DISTINCT $v1 AS node FROM $edge_table
        UNION
        SELECT DISTINCT $v2 AS node FROM $edge_table
    ), 
    edges AS (
    SELECT 
        node,
        $edge_id_field AS edge
    FROM nodes JOIN $edge_table
        ON node = $v1 OR node = $v2
    )
    SELECT
        node,
        array_agg(CASE WHEN node = $v2 THEN $v1 
        WHEN node = $v1 THEN $v2
        ELSE NULL
        END) AS neighbor    
    FROM edges JOIN $edge_table ON 
        (node = $v2 AND edge = $edge_id_field) OR 
        (node = $v1 AND edge = $edge_id_field)
    GROUP BY node");
    
    // now make the results into php results
    echo "putting the results in an array\n";
    while($r = pg_fetch_object($neighbors_query)){ // for each node record
        $nodes[$r->node]['id'] = $r->node;
        $nodes[$r->node]['neighbors'] = explode(',',trim($r->neighbor,'{}'));
    }
    
    // create a temporary table to store results
    pg_query("
        DROP TABLE IF EXISTS temp_nodes;
        CREATE TABLE temp_nodes (node integer, component integer);
    ");
    
    // the big traversal
    echo "traversing graph (this part takes a while)\n";
    foreach($nodes as $id => $values){
        if(!isset($values['index'])){
            tarjan($id, 'no parent');
        }
    }
    
    // identify dangling edges
    echo "identifying dangling edges\n";
    pg_query("
        UPDATE $edge_table SET $dangling_boolean_field = FALSE; 
        WITH dcn AS ( -- DisConnected Nodes
            -- get nodes that are NOT in the primary component
            SELECT node FROM temp_nodes WHERE component != (
                -- select the number of the largest component
                SELECT component
                FROM temp_nodes 
                GROUP BY component 
                ORDER BY count(*) DESC
                LIMIT 1)
        ),
        edges AS (
            SELECT DISTINCT e.$edge_id_field AS disconnected_edge_id
            FROM 
                dcn JOIN $edge_table AS e ON dcn.node = e.$v1 OR dcn.node = e.$v2
        )
        UPDATE $edge_table SET $dangling_boolean_field = TRUE
        FROM edges WHERE $edge_id_field = disconnected_edge_id;
    ");
    
    // clean up after ourselves
    echo "cleaning up\n";
    pg_query("DROP TABLE IF EXISTS temp_nodes;");
    pg_query("VACUUM ANALYZE;");
    
     // the recursive function definition
    //
    function tarjan($id, $parent)
    {
        global $nodes;
        global $index;
        global $component_index;
        global $stack;
    
        // mark and push
        $nodes[$id]['index'] = $index;
        $nodes[$id]['lowlink'] = $index;
        $index++;
        array_push($stack, $id);
    
        // go through neighbors
        foreach ($nodes[$id]['neighbors'] as $child_id) {
            if ( !isset($nodes[$child_id]['index']) ) { // if neighbor not yet visited
                // recurse
                tarjan($child_id, $id);
                // find lowpoint
                $nodes[$id]['lowlink'] = min(
                    $nodes[$id]['lowlink'],
                    $nodes[$child_id]['lowlink']
                );
            } else if ($child_id != $parent) { // if already visited and not parent
                // assess lowpoint
                $nodes[$id]['lowlink'] = min(
                    $nodes[$id]['lowlink'],
                    $nodes[$child_id]['index']
                );
            }
        }
        // was this a root node?
        if ($nodes[$id]['lowlink'] == $nodes[$id]['index']) {
            do {
                $w = array_pop($stack);
                $scc[] = $w;
            } while($id != $w);
            // record results in table
            pg_query("
                INSERT INTO temp_nodes (node, component)
                VALUES (".implode(','.$component_index.'),(',$scc).",$component_index) 
            ");
            $component_index++;
        }
        return NULL;
    }
    
    ?>