Search code examples
gremlinjanusgraph

Gremlin - The order in which vertices are returned when using limit with range queries with JanusGraph


I am reading the book Practical Gremlin section 3.4.1. :Retrieving a range of vertices" and it is written

"There is no guarantee as to which airport vertices will be selected as this depends upon how they are stored by the back end graph. Using TinkerGraph the airports will most likely come back in the order they are put into the graph. This is not likely to be the case with other graph stores such as JanusGraph. So do not rely on any sort of expectation of order when using range to process sets of vertices."

Suppose there are millions of vertices are present in JanusGraph and If I wrote the below query

g.V().limit(10000).range(0,100) -> will the result will be always consistent?

Or will it return different vertices for different Gremlin Query executions?

Whenever I am trying g.V().limit(10000).range(0,100), it is returning the same results with JanusGraph, so can I rely on this result and expect that for any number of query executions the results will be same?

Please suggest

I require to first 100 results always, wanted to avoid order by as it could be not efficient for records with millions of vertices, so I need to be sure that always consistent results are returned using the above query.


Solution

  • The limit and range does not impact the incoming order of incoming solution set. However, the order is governed by the steps before limit step. Which in this case is GraphStep.

    The order in this case could be governed by following (but not limited to) :

    1. Storage mechanism: Tinkerpop allows graph provider to store data as per their indexing strategies but enforces only the step semantics. What it means the g.V() is supposed to fetch all vertices but the order in which they get fetched depends on the storage.
    2. Parallel execution: If a query engine executes steps in parallelized threads, then for few steps which doesn't enforce order guarantees, the order could be non deterministic.

    Since your usecase requires you to expect order guarantees, I would recommend using order step.

    For example:

    g.V().order().by(T.id).limit(10000).range(0,100)
    

    Every graph implementation of tinkerpop should ensure the result for this query would be same all the time. You pay a price of ordering but it guarantees your result expectation.