Search code examples
sparqlgremlintinkerpoptinkerpop3janusgraph

Sparql-gremlin tool don't use indexes


I use sparql-gremlin 3.4.0 with janusgraph 0.3.1. After I create the index on vertex property 'iri', the gremlin query give an immediate result. Instead, if I excecute the same query in sparql, it not use any index. In the example below I use the force-index option to avoid scan query.

gremlin_sparql_query

Any suggestion?


Solution

  • There could be two issues to consider: (1) TinkerPop isn't properly optimizing that query to a state that JanusGraph can easily use an index or (2) JanusGraph isn't optimizing some aspect of the query to use the index. For the latter case, JanusGraph would have to nicely optimize match() step to use an index as that is the core step sparql-gremlin uses in its translation process. I'm pretty sure that it doesn't do that. Speaking to the former case, JanusGraph likely relies on TinkerPop to convert a match() into something easier to consume - in your example, hopefully JanusGraph would get to work with the query you wrote to initially test - g.V().has('iri', ...). I think explain() would show you what's going on there as it did for me when I tested a variation of your example with TinkerGraph:

    gremlin> s.sparql("SELECT ?x WHERE { ?x v:name 'marko' }").explain()
    ==>Traversal Explanation
    ===========================================================================================================================================================================================
    Original Traversal                 [InjectStep([SELECT ?x WHERE { ?x v:name 'marko' }])]
    
    ConnectiveStrategy           [D]   [InjectStep([SELECT ?x WHERE { ?x v:name 'marko' }])]
    SparqlStrategy               [D]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    MatchPredicateStrategy       [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    FilterRankingStrategy        [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    EarlyLimitStrategy           [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    InlineFilterStrategy         [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    IncidentToAdjacentStrategy   [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    AdjacentToIncidentStrategy   [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    RepeatUnrollStrategy         [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    CountStrategy                [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    PathRetractionStrategy       [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    LazyBarrierStrategy          [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    TinkerGraphCountStrategy     [P]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    TinkerGraphStepStrategy      [P]   [TinkerGraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    ProfileStrategy              [F]   [TinkerGraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    StandardVerificationStrategy [V]   [TinkerGraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    
    Final Traversal                    [TinkerGraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(x), PropertiesStep([name],value), IsStep(eq(marko)), MatchEndStep]]), SelectOneStep(last,x)]
    

    Not so good.

    So, the options for a solution are:

    1. JanusGraph needs to better optimize match() to handle this sort of query or
    2. TinkerPop standard traversal strategies should be better at converting such queries to more general patterns or
    3. sparql-gremlin should compile to Gremlin that matches more cleanly to existing traversal strategies

    On that last point, note what happens if sparql-gremlin had generated this match() query instead:

    gremlin> g.V().match(__.as('a').has('person','name','marko')).select('a').values('name').explain()
    ==>Traversal Explanation
    ==========================================================================================================================================================================================================
    Original Traversal                 [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(a), HasStep([~label.eq(person), name.eq(marko)]), MatchEndStep]]), SelectOneStep(last,a), PropertiesStep([name],v
                                          alue)]
    
    ConnectiveStrategy           [D]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(a), HasStep([~label.eq(person), name.eq(marko)]), MatchEndStep]]), SelectOneStep(last,a), PropertiesStep([name],v
                                          alue)]
    MatchPredicateStrategy       [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(a), HasStep([~label.eq(person), name.eq(marko)]), MatchEndStep]]), SelectOneStep(last,a), PropertiesStep([name],v
                                          alue)]
    FilterRankingStrategy        [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(a), HasStep([~label.eq(person), name.eq(marko)]), MatchEndStep]]), SelectOneStep(last,a), PropertiesStep([name],v
                                          alue)]
    EarlyLimitStrategy           [O]   [GraphStep(vertex,[]), MatchStep(AND,[[MatchStartStep(a), HasStep([~label.eq(person), name.eq(marko)]), MatchEndStep]]), SelectOneStep(last,a), PropertiesStep([name],v
                                          alue)]
    InlineFilterStrategy         [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), PropertiesStep([name],value)]
    IncidentToAdjacentStrategy   [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), PropertiesStep([name],value)]
    AdjacentToIncidentStrategy   [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), PropertiesStep([name],value)]
    RepeatUnrollStrategy         [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), PropertiesStep([name],value)]
    CountStrategy                [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), PropertiesStep([name],value)]
    PathRetractionStrategy       [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    LazyBarrierStrategy          [O]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    TinkerGraphCountStrategy     [P]   [GraphStep(vertex,[]), HasStep([~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    TinkerGraphStepStrategy      [P]   [TinkerGraphStep(vertex,[~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    ProfileStrategy              [F]   [TinkerGraphStep(vertex,[~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    StandardVerificationStrategy [V]   [TinkerGraphStep(vertex,[~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    
    Final Traversal                    [TinkerGraphStep(vertex,[~label.eq(person), name.eq(marko)])@[a], SelectOneStep(last,a), NoOpBarrierStep(2500), PropertiesStep([name],value)]
    

    Much better. So, I tend to think that this is a general issue for TinkerPop to solve and it involves some combination of those last two points. Of course, it would be nice if JanusGraph optimized match() further if they could. None of this is a solution to your problem of course, but it should at least explain what's going on and where the problem is. I've created TINKERPOP-2325 for further discussion and tracking.