orientdb graph-databases tinkerpop tinkerpop3

Finding vertices with exact edge matches traversing through children

My setup:

I'm using OrientDB with a large graph of People vertices. I'm using the gremlin java driver to access this database since I would like to eventually switch to a different graph database down the line.

My database:

Each person has certain preference vertices (connected via a labeled edge describing that relation to that preference). All preferences are then connected to core concept vertex.

Problem I'm trying to solve:

I'm trying to find a way (kudos if its as simple as a Gremlin query) to start at a Person vertex and traverse down to all people with identical preferences via a core concept.

Here is a made up example of a matching case. Person B will be returned in a list of perfect matches of people when starting at Person A. I forgot to draw the directions to those edges on this picture :/ take a look at the non matching case to see the directions.

Here is an example of a non matching case. Person B will not be returned in a list of perfect matches of People. Why? Because all outgoing edges on Person B do not resolve to identically matching edges on Person A; in this case, Person A refuses to eat apples, but Person B doesn't list a similar preference to anything they refuse to eat.

Another non matching case from the above example: If Person A refuses to eat apples and Person B refuses to eat bananas -- they will not match.

If Person B likes Fries the most and likes Cheeseburgers the least, that would be a non-matching case as well.

My initial idea (that I'm not sure how to implement) with a query

I would start at person A
Find all outgoing edges to preference vertices and store some sort of "marker" or map to that preference vertex with the edge label.
Traverse out of those vertices down all SimilarTo labeled edges. Copy those markers from preference vertex into the concept vertex.
Reverse the line: concept vertex -> preference vertex (copy makers from concept to preference vertex)
... then somehow match ALL edges to those markers...
exclude person a from the results

Any ideas?

Solution

Let's start with the creation of your sample graph:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("person").property("name", "Person A").as("pa").
......1>   addV("person").property("name", "Person B").as("pb").
......2>   addV("food").property("name", "Hamburgers").as("hb").
......3>   addV("food").property("name", "Chips").as("c").
......4>   addV("food").property("name", "Cheeseburgers").as("cb").
......5>   addV("food").property("name", "Fries").as("f").
......6>   addV("category").property("name", "Burgers").as("b").
......7>   addV("category").property("name", "Appetizers").as("a").
......8>   addE("most").from("pa").to("hb").
......9>   addE("most").from("pb").to("cb").
.....10>   addE("least").from("pa").to("c").
.....11>   addE("least").from("pb").to("f").
.....12>   addE("similar").from("hb").to("b").
.....13>   addE("similar").from("cb").to("b").
.....14>   addE("similar").from("c").to("a").
.....15>   addE("similar").from("f").to("a").iterate()

The query, you're looking for, is the following (I will explain each step later):

gremlin> g.V().has("person", "name", "Person A").as("p").
......1>   outE("most","least","refuses").as("e").inV().out("similar").
......2>   store("x").by(constant(1)).
......3>   in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4>   groupCount().as("m").
......5>   select("x").by(count(local)).as("c").
......6>   select("m").unfold().
......7>   where(select(values).as("c")).select(keys).values("name")
==>Person B

Now, when we add the "refuses to eat Apples" relation:

gremlin> g.V().has("person", "name", "Person A").as("p").
......1>   addV("food").property("name", "Apples").as("a").
......2>   addV("category").property("name", "Fruits").as("f").
......3>   addE("refuses").from("p").to("a").
......4>   addE("similar").from("a").to("f").iterate()

...Person B is no longer a match:

gremlin> g.V().has("person", "name", "Person A").as("p").
......1>   outE("most","least","refuses").as("e").inV().out("similar").
......2>   store("x").by(constant(1)).
......3>   in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4>   groupCount().as("m").
......5>   select("x").by(count(local)).as("c").
......6>   select("m").unfold().
......7>   where(select(values).as("c")).select(keys).values("name")
gremlin>

Let's go through the query step by step / line by line:

g.V().has("person", "name", "Person A").as("p").

This should be pretty clear: start at Person A.

outE("most","least","refuses").as("e").inV().out("similar").

Traverse the out edges and set a marker, so that we can reference the edges later. Then traverse to what I called category vertices.

store("x").by(constant(1)).

For every category vertex add a 1 to an internal collection. You could also store the vertex itself, but this would be a waste of memory, since we won't need any information from the vertices.

in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).

Traverse the other direction along the similar edges to the food and then along those edges that have the same label as the marked edge from the beginning. In the end ignore the person where the traversal started (Person A).

groupCount().as("m").

Count the number of traversers that made it to each person vertex.

select("x").by(count(local)).as("c").

Count the number of Category vertices (the 1s).

select("m").unfold().

Unfold the person counters, so the keys will be the person vertices and the values will be the number of traversers that made it to this vertex.

where(select(values).as("c")).select(keys).values("name")

Ultimately the number of crossed category vertices must match the number of traversers on a person vertex. If that's the case, we have a match.

Note, that it's necessary to have a similar edge incident to the Apples vertex.