I'm trying to write a Java wrapper that will help me upsert (or even just insert) vertices in a Gremlin server.
I realize that different tools may support other methods, e.g. AWS Neptune (which is my primary target backend) has a bulk loading API that reads data from S3, but I'm trying to avoid things that are only supported by one implementation as my requirements may evolve to include supporting other backends (and I'd like to support at least TinkerGraph, for unit testing). I don't plan on upserting many vertices per batch; this is not for populating a huge graph from scratch, this is for adding to an existing graph from a stream of data.
The Tinkerpop documentation mentions addVertex
but discourages its use, recommending g.addV()
instead. My problem with g.addV()
, and the Gremlin DSL in general, is that although it's not too hard to write one-off queries with it (... assuming you understand Gremlin well, which I don't), is that I can't figure out how to build queries dynamically.
Given as input a set of vertices and their properties, with arbitrary cardinality (let's say it comes from a CSV file of anywhere between 1-100 lines), how do I dynamically build a graph traversal that upserts all of the vertices? Alternate question: Is there a backend-agnostic tool for loading a few hundred vertices' worth of data in a local or remote graph?
This seems to be a Java issue as much as a Gremlin one. I have tried building a graph traversal by composing functions with Function#andThen(Function)
but I am quickly running into issues with Java's generics because graph traversal methods return GraphTraversal<S, E>
where both S and E depend on the actual graph traversal method. E.g. addV()
returns GraphTraversal<S, Vertex>
and addE()
returns GraphTraversal<S, Edge>
.
In Java (or any GLV), you can build up a query in code and then submit it by appending a Terminal [1] step:
query = g.addV('test').property(id,'v1')
query = query.addV('test').property(id,'v2')
query = query.addV('test').property(id,'v3')
// so on and so forth - or you can do this in a loop
// and then submit it with
result = query.iterate()
There are other (perhaps more clever) methods of doing this with an injected map:
g.inject([
[ id: 'v347', label: 'test', name: 'Son' ],
[ id: 'v348', label: 'test', name: 'Messi' ],
[ id: 'v349', label: 'test', name: 'Suarez' ],
[ id: 'v350', label: 'test', name: 'Kane' ]
]).unfold().
addV(select('label')).
property(id,select('id')).
property('name',select('name'))
TinkerPop 3.6 also brings forth the mergeV()
and mergeE()
steps [2]. Although, at the time of this writing, Neptune only supports up to TinkerPop 3.5.3 in the latest Neptune engine (3.6 is forth coming).
[1] https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
[2] https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step