Search code examples
graphgremlintinkerpopamazon-neptunegremlinpython

Gremlin / tinkerpop: insert key:value property constant in every vertex and edge traversed then return path


Using AWS Neptune with Gremlin query language (last version).

Using sample data inserted this way:

g.addV('test_report').property('name', 'REF').property('creationDateTime','2022-07-01 00:00:00.000000')
g.addV('test_reportrelease').property('name', 'A').property('creationDateTime','2022-07-01 01:00:00.000000')
g.addV('test_reportrelease').property('name', 'B').property('creationDateTime','2022-07-01 02:00:00.000000')
g.addV('test_reportrelease').property('name', 'C').property('creationDateTime','2022-07-01 03:00:00.000000')

g.addE('test_has').property('creationDateTime','2022-07-02 01:00:00.000000')
.from(V().hasLabel('test_report').has('name', 'REF'))
.to(V().hasLabel('test_reportrelease').has('name', 'A'))

g.addE('test_has').property('creationDateTime','2022-07-02 02:00:00.000000')
.from(V().hasLabel('test_report').has('name', 'REF'))
.to(V().hasLabel('test_reportrelease').has('name', 'B'))

g.addE('test_has').property('creationDateTime','2022-07-02 03:00:00.000000')
.from(V().hasLabel('test_report').has('name', 'REF'))
.to(V().hasLabel('test_reportrelease').has('name', 'C'))

What I want is:

  • First of all, get the vertices with label "test_report"
  • then follow all the next statements (union)
  • then follow, if exists, all outgoing edges (outE) with the label "test_has" between vertices with label "test_report" and "test_reportrelease"; then follow all ingoing vertices (inV), and apply a constant with name "ref" and value test_has" on every edge browsed
  • then follow, if exists, as well the edge "test_has" and ingoing vertices but keep only the first result according to an asc order on creationDateTime, and apply a constant with name "ref" and value "test_first" on every edge browsed
  • then follow, if exists, as well the edge "test_has" but and ingoing vertices keep only the first result according to a desc order on creationDateTime, and apply a constant with name "ref" and value "test_last" on every edge browsed
  • then use the tree() step to get all browsed vertices and edges as a tree

The main problem is to add the "ref" constant to every edge followed (or change the label at query time).

The sample query I wrote for most of my needs, missing the "ref" constant, is:

g.V().hasLabel('test_report')
.union(optional(
outE().hasLabel('test_has').order().by('creationDateTime').inV()),
optional(outE().hasLabel('test_has').order().by('creationDateTime').limit(1).inV()),
optional(outE().hasLabel('test_has').order().by(coalesce(values('creationDateTime'), constant('')), desc).limit(1).store('last').inV())
).valueMap(true).path()

Question: How to insert a key:value constant property on every edge or vertex traversed ?

So that the result looks like this (but formatted as a tree - path being more readable when testing):

1   path[v[70c0dcd5-a6b9-4532-28bf-85705e94697e], e[66c0dcd7-31ed-e381-5331-e8c73bb91be1][70c0dcd5-a6b9-4532-28bf-85705e94697e-test_has->c0c0dcd5-c102-4031-d739-b5bd8fe161bc], v[c0c0dcd5-c102-4031-d739-b5bd8fe161bc], {<T.id: 1>: 'c0c0dcd5-c102-4031-d739-b5bd8fe161bc', <T.label: 4>: 'test_reportrelease', 'name': ['A'], 'creationDateTime': ['2022-07-01 01:00:00.000000'], 'ref': ['test_has']}]
2   path[v[70c0dcd5-a6b9-4532-28bf-85705e94697e], e[b0c0dcd7-4d99-1f3c-077d-decd2e251c46][70c0dcd5-a6b9-4532-28bf-85705e94697e-test_has->68c0dcd5-c5c5-c869-d789-96acbb88131f], v[68c0dcd5-c5c5-c869-d789-96acbb88131f], {<T.id: 1>: '68c0dcd5-c5c5-c869-d789-96acbb88131f', <T.label: 4>: 'test_reportrelease', 'name': ['B'], 'creationDateTime': ['2022-07-01 02:00:00.000000'], 'ref': ['test_has']}]
3   path[v[70c0dcd5-a6b9-4532-28bf-85705e94697e], e[70c0dcd7-72ac-204f-1341-cc843d165a38][70c0dcd5-a6b9-4532-28bf-85705e94697e-test_has->5ac0dcd5-cb10-f392-7d80-69541c4f22eb], v[5ac0dcd5-cb10-f392-7d80-69541c4f22eb], {<T.id: 1>: '5ac0dcd5-cb10-f392-7d80-69541c4f22eb', <T.label: 4>: 'test_reportrelease', 'name': ['C'], 'creationDateTime': ['2022-07-01 03:00:00.000000'], 'ref': ['test_has']}]
4   path[v[70c0dcd5-a6b9-4532-28bf-85705e94697e], e[66c0dcd7-31ed-e381-5331-e8c73bb91be1][70c0dcd5-a6b9-4532-28bf-85705e94697e-test_has->c0c0dcd5-c102-4031-d739-b5bd8fe161bc], v[c0c0dcd5-c102-4031-d739-b5bd8fe161bc], {<T.id: 1>: 'c0c0dcd5-c102-4031-d739-b5bd8fe161bc', <T.label: 4>: 'test_reportrelease', 'name': ['A'], 'creationDateTime': ['2022-07-01 01:00:00.000000'], 'ref': ['test_first']}]
5   path[v[70c0dcd5-a6b9-4532-28bf-85705e94697e], e[70c0dcd7-72ac-204f-1341-cc843d165a38][70c0dcd5-a6b9-4532-28bf-85705e94697e-test_has->5ac0dcd5-cb10-f392-7d80-69541c4f22eb], v[5ac0dcd5-cb10-f392-7d80-69541c4f22eb], {<T.id: 1>: '5ac0dcd5-cb10-f392-7d80-69541c4f22eb', <T.label: 4>: 'test_reportrelease', 'name': ['C'], 'creationDateTime': ['2022-07-01 03:00:00.000000'], 'ref': ['test_last']}]

I have tried those approaches, but both interrupt the graph traversal when using valueMap:

g.V().hasLabel('test_report').outE().hasLabel('test_has')
.order().by('creationDateTime').limit(1).valueMap(true).unfold().inject(['ref':'test_first']).fold()

And

g.V().hasLabel('test_report').outE().hasLabel('test_has')
.order().by('creationDateTime').limit(1).union(valueMap(true).unfold(), project('ref').by(constant('test_first'))).fold()

Is there a way to achieve such thing ?

I don't want to have the results stored in the database, just need to have the values in the results.

PS: I'm querying my graph db from a Jupyter SageMaker Notebook.


Solution

  • Your case sounds somewhat similar to one that came up on the Gremlin Users list recently.

    Using the air-routes data set, we can add additional key/value pairs to a result set using a query such as this one.

    gremlin> g.V().has('code','DAL').
    ......1>       union(valueMap('city','desc').by(unfold()),
    ......2>             constant(['special':1234])).unfold().
    ......3>       group().
    ......4>         by(keys).
    ......5>         by(values)
    
    ==>[special:[1234],city:[Dallas],desc:[Dallas Love Field]]     
    

    You should be able to use a similar construct in your query. All the query above does is to get some properties and their values from a vertex, and then supplement those using a constant step to add the special key with the 1234 value into the result. The unfold gives access to each "row" of the map. The group builds a new single map containing all three k/v pairs.

    UPDATED

    Here is a second example that shows how to let the query accumulate the results and do some additional traversing afterwards.

    gremlin> g.V().has('code','DAL').
    ......1>     sideEffect(
    ......2>       union(valueMap('city','desc').by(unfold()),
    ......3>             constant(['special':1234])).unfold().
    ......4>       group('x').
    ......5>         by(keys).
    ......6>         by(values)).
    ......7>       project('map','count').
    ......8>         by(select('x')).
    ......9>         by(out().count())  
    
    ==>[map:[special:[1234],city:[Dallas],desc:[Dallas Love Field]],count:57]   
    

    FURTHER UPDATED

    To use a sack instead of sideEffect

    gremlin> g.V().has('code','DAL').
    ......1>     sack(assign).
    ......2>       by(union(valueMap('city','desc').by(unfold()),
    ......3>                constant(['special':1234])).unfold().
    ......4>          group().
    ......5>           by(keys).
    ......6>           by(values)).
    ......7>       project('map','count').
    ......8>         by(sack()).
    ......9>         by(out().count()) 
    
    ==>[map:[special:[1234],city:[Dallas],desc:[Dallas Love Field]],count:57]