Search code examples
gremlingraph-databasesjanusgraph

Or statement with Match statement in Gremlin


I have a Janusgraph database with the following schema:

(Journal)<-[PublishedIn]-(Paper)<-[AuthorOf]-(Author)

I'm trying to write a query using the gremlin match() clause that will search for two different journals and the related papers with a keyword in the title and the authors. Here's what I have so far:

sg = g.V().match(
    __.as('a').has('Journal', 'displayName', textContains('Journal Name 1')),
    __.as('a').has('Journal', 'displayName', textContains('Journal Name 2')),
    __.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'), 
    __.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
    __.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
 cap('sg').next()

This query runs successfully but returns 0 vertices and 0 edges. If I divide the query into two and search for each Journal displayName separately I get complete graphs, so I assume there's something wrong with the logic/syntax of my query.

If I write the query this way:

sg = g.V().or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
              has('JournalFixed', 'displayName', textContains('Journal Name 2'))).
              inE('PublishedInFixed').subgraph('sg').
              outV().has('Paper', 'paperTitle', textContains('My Key word')).
              inE('AuthorOf').subgraph('sg').
              outV().
              cap('sg').
              next()

It returns a network with around 7000 nodes. How can I re-write this query to use the match() clause?


Solution

  • I'm not sure if this is all of your problem but I think your match() is modelling your "displayName" steps to be and() rather than or(). You can check with profile() as I did here with TinkerGraph:

    gremlin> g.V().match(__.as('a').has('name','marko'), __.as('a').has('name','josh')).profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    TinkerGraphStep(vertex,[name.eq(marko), name.eq...                                             0.067   100.00
                                                >TOTAL                     -           -           0.067        -
    

    You could resolve this a number of ways I suppose. For my example use of within(), as described in a different answer to an earlier question from you, works nicely:

    gremlin> g.V().match(__.as('a').has('name', within('marko','josh'))).profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    TinkerGraphStep(vertex,[name.within([marko, jos...                     2           2           0.098   100.00
                                                >TOTAL                     -           -           0.098        -
    

    For your case, I would replace:

    or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
       has('JournalFixed', 'displayName', textContains('Journal Name 2')))
    

    with:

    has('JournalFixed', 'displayName', textContains('Journal Name 1').
                                       or(textContains('Journal Name 2'))
    

    essentially taking advantage of P.or(). I think that either of these options should be better than using or()-step up front, but only a profile() of JanusGraph would tell as discussed here.

    All that said, I'd wonder why your or() could not be translated directly into the match() as follows:

    g.V().match(
        __.as('a').or(has('Journal', 'displayName', textContains('Journal Name 1')),
                      has('Journal', 'displayName', textContains('Journal Name 2'))),
        __.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'), 
        __.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
        __.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
     cap('sg')
    

    I'd imagine though that my suggestion of P.or() is significantly more performant.