Search code examples
gremlintinkerpopjanusgraphtinkerpop3gremlinpython

Check if a vertex exist, if it doesn't create it using an injected list of properties


UPDATE: I feel silly. Decided to just query the db for a list of all the names, list_of_names_in_db = g.V().hasLabel('Person').values('name').toList() then comparing list_of_name_in_db to batch and only adding vertices for the ones not on both list.


A little over a week into learning Gremlin-Python and I have a table of vertices that I'm looping through, for each vertex in the table I'm checking if it already exists. If it doesn't, create it. I've seen a bunch of examples, but none of the examples uses inject() in collaboration with coalesce. Is that something that's not possible, or am I just doing it wrong? Here's what I've tried:

from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.traversal import Column
from gremlin_python.process.anonymous_traversal import traversal

batch = [
    {
        'name': 'John',
        'age': 20,
        'height': 67,
        'weight': 140,
        'blood-type': 'B+',
        'state': 'PA',
        'email': 'Johnny5@gmail.com'
    },
    {
        'name': 'Steve',
        'age': 25,
        'height': 60,
        'weight': 110,
        'blood-type': 'B+',
        'state': 'CA',
        'email': 'DidIDoThat@gmail.com'
    }
]

g.inject(batch).as_('data').
    coalesce(
        __.V().has('Person','name',__.select('data').unfold().select('name')),
        __.addV('Person').as_('P').
        select('data').unfold().as_('kv').
        select('P').property(
            __.select('kv').by(Column.keys),
            __.selecy('kv'.by(Column.values)))
        ).iterate()

The issue is that this creates duplicate entries every time the query is run. I think it's because of where the inject() is in correlation to where coalesce() is? But I'm not sure.

Side Question: I've included my imports because I still have to use the '__' in front of some of my steps, even though I've imported the from gremlin_python.process.graph_traversal import __ The same for 'keys'. I have to use Column.keys am I missing a step?


Solution

  • The has step cannot take a traversal as it's second parameter so the query is not working as you might expect. In the recently delivered TinkerPop 3.6 version a new mergeV step has been added that makes what you are looking to do much simpler. Until graph database providers move up to that TinkerPop version you will still need to use some combination of map injection and coalesce. Instead of using has in this case, you will need to use the where....by construct in order to build the test for existence.

    If possible, rather than check for the existence of a name property, it would be better to check for the existence of a known, unique, vertex ID, and if not found, create it.

    Using the air-routes data set, here is what a slightly simplified form of that map injection plus where...by pattern looks like:

    gremlin> g.inject(['code':'AUS']).as('codes').
    ......1>   V().as('v').
    ......2>   where(eq('v')).
    ......3>     by(select('codes').select('code')).
    ......4>     by('code')
    ==>v[3]
    

    But as mentioned, I would, if possible look for the existence of a known ID and only then use the map to create the vertex if not found.