My setup:
I'm using OrientDB with a large graph of People vertices. I'm using the gremlin java driver to access this database since I would like to eventually switch to a different graph database down the line.
My database:
Each person has certain preference vertices (connected via a labeled edge describing that relation to that preference). All preferences are then connected to core concept vertex.
Problem I'm trying to solve:
I'm trying to find a way (kudos if its as simple as a Gremlin query) to start at a Person vertex and traverse down to all people with identical preferences via a core concept.
Here is a made up example of a matching case. Person B will be returned in a list of perfect matches of people when starting at Person A. I forgot to draw the directions to those edges on this picture :/ take a look at the non matching case to see the directions.
Here is an example of a non matching case. Person B will not be returned in a list of perfect matches of People. Why? Because all outgoing edges on Person B do not resolve to identically matching edges on Person A; in this case, Person A refuses to eat apples, but Person B doesn't list a similar preference to anything they refuse to eat.
Another non matching case from the above example: If Person A refuses to eat apples and Person B refuses to eat bananas -- they will not match.
If Person B likes Fries the most and likes Cheeseburgers the least, that would be a non-matching case as well.
My initial idea (that I'm not sure how to implement) with a query
Any ideas?
Let's start with the creation of your sample graph:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("person").property("name", "Person A").as("pa").
......1> addV("person").property("name", "Person B").as("pb").
......2> addV("food").property("name", "Hamburgers").as("hb").
......3> addV("food").property("name", "Chips").as("c").
......4> addV("food").property("name", "Cheeseburgers").as("cb").
......5> addV("food").property("name", "Fries").as("f").
......6> addV("category").property("name", "Burgers").as("b").
......7> addV("category").property("name", "Appetizers").as("a").
......8> addE("most").from("pa").to("hb").
......9> addE("most").from("pb").to("cb").
.....10> addE("least").from("pa").to("c").
.....11> addE("least").from("pb").to("f").
.....12> addE("similar").from("hb").to("b").
.....13> addE("similar").from("cb").to("b").
.....14> addE("similar").from("c").to("a").
.....15> addE("similar").from("f").to("a").iterate()
The query, you're looking for, is the following (I will explain each step later):
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> outE("most","least","refuses").as("e").inV().out("similar").
......2> store("x").by(constant(1)).
......3> in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4> groupCount().as("m").
......5> select("x").by(count(local)).as("c").
......6> select("m").unfold().
......7> where(select(values).as("c")).select(keys).values("name")
==>Person B
Now, when we add the "refuses to eat Apples" relation:
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> addV("food").property("name", "Apples").as("a").
......2> addV("category").property("name", "Fruits").as("f").
......3> addE("refuses").from("p").to("a").
......4> addE("similar").from("a").to("f").iterate()
...Person B is no longer a match:
gremlin> g.V().has("person", "name", "Person A").as("p").
......1> outE("most","least","refuses").as("e").inV().out("similar").
......2> store("x").by(constant(1)).
......3> in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
......4> groupCount().as("m").
......5> select("x").by(count(local)).as("c").
......6> select("m").unfold().
......7> where(select(values).as("c")).select(keys).values("name")
gremlin>
Let's go through the query step by step / line by line:
g.V().has("person", "name", "Person A").as("p").
This should be pretty clear: start at Person A.
outE("most","least","refuses").as("e").inV().out("similar").
Traverse the out edges and set a marker, so that we can reference the edges later. Then traverse to what I called category
vertices.
store("x").by(constant(1)).
For every category
vertex add a 1
to an internal collection. You could also store the vertex itself, but this would be a waste of memory, since we won't need any information from the vertices.
in("similar").inE().where(eq("e")).by(label).outV().where(neq("p")).
Traverse the other direction along the similar
edges to the food
and then along those edges that have the same label as the marked edge from the beginning. In the end ignore the person where the traversal started (Person A).
groupCount().as("m").
Count the number of traversers that made it to each person vertex.
select("x").by(count(local)).as("c").
Count the number of Category
vertices (the 1
s).
select("m").unfold().
Unfold the person counters, so the keys will be the person vertices and the values will be the number of traversers that made it to this vertex.
where(select(values).as("c")).select(keys).values("name")
Ultimately the number of crossed category
vertices must match the number of traversers on a person
vertex. If that's the case, we have a match.
Note, that it's necessary to have a similar
edge incident to the Apples
vertex.