Search code examples
azure-cosmosdbgremlin

CosmosDB Graph : "upsert" query pattern


I am new to Gremlin query language. I have to insert data on a Cosmos DB graph (using Gremlin.Net package), whether the Vertex (or Edge) already exists in the graph or not. If the data exists, I only need to update the properties. I wanted to use this kind of pattern:

g.V().hasLabel('event').has('id','1').tryNext().orElseGet {g.addV('event').has('id','1')}

But it is not supported by Gremlin.Net / Cosmos DB graph API. Is there a way to make a kind of upsert query in a single query ?

Thanks in advance.


Solution

  • There are a number of ways to do this but I think that the TinkerPop community has generally settled on this approach:

    g.V().has('event','id','1').
      fold().
      coalesce(unfold(),
               addV('event').property('id','1'))
    

    Basically, it looks for the "event" with has() and uses fold() step to coerce to a list. The list will either be empty or have a Vertex in it. Then with coalesce(), it tries to unfold() the list and if it has a Vertex that is immediately returned otherwise, it does the addV().

    If the idea is to update existing properties if the element is found, just add property() steps after the coalesce():

    g.V().has('event','id','1').
      fold().
      coalesce(unfold(),
               addV('event').property('id','1')).
      property('description','This is an event')
    

    If you need to know if the vertex returned was "new" or not then you could do something like this:

    g.V().has('event','id','1').
      fold().
      coalesce(unfold().
               project('vertex','exists').
                 by(identity()).
                 by(constant(true)),
               addV('event').property('id','1').
               project('vertex','exists').
                 by(identity()).
                 by(constant(false)))
    

    Additional reading on this topic can be found on this question: "Why do you need to fold/unfold using coalesce for a conditional insert?"

    Also note that optional edge insertion is described here: "Add edge if not exist using gremlin".

    As a final note, while this question was asked regarding CosmosDB, the answer generally applies to all TinkerPop-enabled graphs. Of course, how a graph optimizes this Gremlin is a separate question. If a graph has native upsert capabilities, that capability may or may not be used behind the scenes of this Gremlin so there may be better ways to implement upsert by way of the graphs systems native API (of course, choosing that path reduces the portability of your code).

    UPDATE: As of TinkerPop 3.6.0, the fold()/coalesce()/unfold() pattern has been largely replaced by the new steps of mergeV() and mergeE() which greatly simplify the Gremlin required to do an upsert-like operation. Under 3.6.0 and newer versions, you would replace the first example with:

    g.mergeV([(label): 'event', id: '1'])
    

    or perhaps better, treat the property key named "id" as an actual vertex identifier of T (I've added the property key of "name" to help with the example):

    g.mergeV([(label): 'event', (id): '1', name: 'stephen'])
    

    The above will search for a vertex with the T.label, T.id and "name" in the Map. If it finds it, it is returned. If it does not find it, the Vertex is created using those values. If you have the T.id it may be even better to do:

    g.mergeV([(id): '1']).
        option(onCreate, [(label): 'event', name: 'stephen'])
    

    In this way, you limit the search criteria to just the identifier which is enough to uniquely identify it and avoids additional filters and then if the verted is not found the onCreate is triggered to use the supplied Map to create the vertex in conjunction with the search criteria Map.