Search code examples
google-cloud-spanner

Does Spanner support recursive queries of some kind?


I'm prototyping something in Google Cloud Spanner and I need a way to do recursive queries like one would do in PostgreSQL / SQLite to process hierarchical or graph like data. I'm looking for syntax like https://sqlite.org/lang_with.html . Does Spanner have a way to do recursive queries like this or maybe a stored procedure like capability to do the recursion there? I can't find syntax for common table expressions.

More details:

Well, I'm modeling directed acyclic graphs (DAG) in Spanner. I'll need to do searches on the graphs. An example query is: "find all the sources connected to a specific node." A first attempt at modeling a graph in Spanner is the following:

CREATE TABLE nodes (
  node_id INT64 NOT NULL,
  node_label String(1024) NOT NULL,
  name String(1024)
) PRIMARY KEY (node_id);


CREATE TABLE out_edges (
  node_id INT64 NOT NULL,
  to_node_id INT64 NOT NULL,
  edge_label String(1024) NOT NULL,
) PRIMARY KEY (node_id, to_node_id),
  INTERLEAVE IN PARENT nodes ON DELETE CASCADE;


CREATE TABLE in_edges (
  node_id INT64 NOT NULL,
  from_node_id INT64 NOT NULL,
  edge_label String(1024) NOT NULL,
) PRIMARY KEY (node_id, from_node_id),
  INTERLEAVE IN PARENT nodes ON DELETE CASCADE;

The use of interleaved tables is to try to achieve quick access to a node's edges. Each unique edge exists in both the out_edges and in_edges tables.

One way to find all the sources connected to a specific node in a graph is to do a DFS or BFS starting at that node and following the edges in reverse order while keeping track of nodes that don't have any in edges as they are traversed. With this representation the two ways I can think of doing this is to use a recursive common table expression (like one could in SQLite or PostgreSQL or CONNECT BY in Oracle) or maintain the transitive closure of the graph and join nodes with no in-edges with those nodes in the transitive closure adjacent to the input node. The later would work in Spanner but requires the all that extra storage and maintenance of transitive closure. And without recursive query of some kind I'd be limited to finding sources a fixed number of edges away from the input node. I guess a different way to accomplish this in Spanner would be to execute the search algorithm in client code. The issue with that would be a potentially large number of queries to Spanner in order to satisfy the original query.


Solution

  • Cloud Spanner does not support recursive queries. If you provide more information about your use case, it's possible there's a good alternative solution.

    https://cloud.google.com/spanner/docs/query-syntax

    Based on additional context:

    It may make sense to do the search in client code, depending on your latency requirements and the shape/size of your graph. Consider whether maintaining a separate interleaved table that keeps track of sources per node is worthwhile. This approach may not generalize well, depending on the graph shape and queries you'll be supporting.

    CREATE TABLE sources (
      node_id INT64 NOT NULL,
      source_node_id INT64 NOT NULL,
    ) PRIMARY KEY (node_id, source_node_id),
      INTERLEAVE IN PARENT nodes ON DELETE CASCADE;