Search code examples
graphgremlintinkerpoptinkerpop3

How to match() on an accumulated value?


enter image description here

Given this ultra-simple graph, with unknown number of vertices between A and Z, and where each edge has a distance (dist) property, I have to answer the following questions:

  1. Does all of the following hold:
  • There is a vertex called A
  • A is eventually (as opposed to directly) followed by a vertex named B at a maximum distance of 500
  • B is eventually followed by C at a maximum distance of 200?
  1. What is the cumulative distance A->B + B->C?

If I disregard the distance constraints, I can find what I need by doing the following:

g.V().match(
            as("a").has("name", "A"), //A exists
            as("a").out()
                    .until(has("name", "B"))
                    .repeat(out())
                    .as("b"),       //A is eventually followed by B
            as("b").out()
                    .until(has("name", "C"))
                    .repeat(out())
                    .as("c"))       //B is eventually followed by C
            .select("a", "b", "c").by("name")

From here, I can simply call hasNext() to check if all the conditions are satisfied, and next() to get the actual matching vertices. So far so good. But how do I add distance constraints now?

Note: One reason I'm using match() is to have a stateless query, decoupled from any specific graph, as I have thousands of graphs and run the same query on each. Another is that parts of the query are generated dynamically, and the declarative syntax lends itself much better to this use-case. Yet another is that I absolutely do not understand how to select the matched vertices using the imperative traversal 🤷‍♂️


Solution

  • Track individual distances via sack(). Also use the current sack value to cancel traversals early (unless distances can be negative). Once you find a checkpoint, store the sack's value in a global side-effect. Ultimately calculate the sum of all stored distances to determine the cumulative distance.

    In code:

    g.V().match(
                __.as("a").has("name", "A"),
                __.as("a").sack(assign).
                             by(constant(0)).
                           repeat(outE().sack(sum).by("dist").
                                  filter(sack().is(lte(500))).inV()).
                             until(has("name", "B")).
                           group("dist").
                             by(constant("a-b")).
                             by(sack()).as("b"),
                __.as("b").sack(assign).
                             by(constant(0)).
                           repeat(outE().sack(sum).by("dist").
                                  filter(sack().is(lte(1000))).inV()).
                             until(has("name", "C")).
                           group("dist").
                             by(constant("b-c")).
                             by(sack()).as("c")).
                select("a", "b", "c", "dist").
                  by("name").
                  by("name").
                  by("name").
                  by(select(values).sum(local))
    

    If distances can have negative values, you'll need to remove the filter() steps.