Search code examples
gremlintinkerpopamazon-neptunetinkerpop3gremlin-server

Gremlin-server: does vertexIdManager=ANY generates id collisions?


I need to configure tinkergraph-empty.properties file in gremlin-server so the ids are generated with the "ANY" logic

gremlin.tinkergraph.vertexIdManager=ANY
gremlin.tinkergraph.edgeIdManager=ANY
gremlin.tinkergraph.vertexPropertyIdManager=ANY

I tested this configuration with an empty database: When I created some vertices I noticed that the ids are numbers represented in a string, the numbers are not sequential, does this mean there will be id collisions? What is the logic behind this?

I need this configuration because I need my local gremlin-server to be compatible with the data in Amazon Neptune, only ANY is compatible with the Neptune ids. I need to be able to load the database content of Neptune in localhost gremlin-server to do some operations with it and maybe then transfer that back to Neptune without issues.

I'm worried about collisions because that destroys data and when my data is destroyed is my project as well.


Solution

  • The reason you may not see sequential IDs is that properties also have IDs and in the case of TinkerGraph, are taken from the same pool. You can see this using the Gremlin Console locally also.

    gremlin> conf = new BaseConfiguration();[]
    gremlin> conf.setProperty("gremlin.tinkergraph.vertexIdManager","ANY");[]
    gremlin> conf.setProperty("gremlin.tinkergraph.edgeIdManager","ANY");[]
    gremlin> conf.setProperty("gremlin.tinkergraph.vertexPropertyIdManager","ANY");[]
    gremlin> g = TinkerGraph.open(conf).traversal()
    ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
    
    gremlin> g.addV('test')
    ==>v[0]
    gremlin> g.addV('test')
    ==>v[1]
    gremlin> g.addV('test').property('x',1)
    ==>v[2]
    gremlin> g.addV('test')
    ==>v[4]
    gremlin> g.V().has('x').properties('x').id()
    ==>3       
    

    Updated 2022/01/16 To address additional questions in comments.

    A file (JSON or GraphML) loaded using g.io can contain user provided IDs. These will work unless the ID is already in use. Duplicate IDs are not allowed and an error will be thrown should any be encountered. Only properties will be automatically given IDs during file loading.