Search code examples
gremlingraph-traversaltinkerpop3

How to query for vertices with multiple edges


What is the best way to query for the following:

(V:Button {color: 'red'})<-[E:Touched {t_date: date}]-(V:User)

1 button, 3 users, each user has multiple edges, with various t_dates, to the button with color red. There is only 1 red button, but users and interactions will be in millions. I am assuming that having button as a starting point is the way to go. All properties have correct indexes to support range queries, etc.

  1. Count of users who touched button between date A and B.
  2. Count of users who touched button between date Z and Y AND didn't touch it between date A and B
  3. Count of users who touched button between date A and B AND also between date Z and Y.

Thank you!


Solution

  • Count of users who touched button between date A and B.

    g.V().has("Button", "color", "red").
      inE("Touched").has("t_date", between(A, B)).outV().dedup().count()
    

    Count of users who touched button between date Z and Y AND didn't touch it between date A and B

    g.V().has("Button", "color", "red").as("b").
      sideEffect(inE("Touched").has("t_date", between(A, B)).inV().aggregate("x")).
      inE("Touched").has("t_date", between(Y, Z)).inV().where(without("x")).dedup().count()
    

    Count of users who touched button between date A and B AND also between date Z and Y.

    g.V().has("Button", "color", "red").as("b").
      sideEffect(inE("Touched").has("t_date", between(A, B)).inV().aggregate("x")).
      inE("Touched").has("t_date", between(Y, Z)).inV().where(within("x")).dedup().count()
    

    You can remove dedup() if there can only be one edge between a user and a button. The only thing I'm worried about is that "users and interactions will be in millions". Your model won't scale and traversing millions of edge won't perform well (if at all).