Search code examples
pythonneo4jcypherpy2neo

py2neo not enforcing uniqueness constraints in Neo4j database


I have a neo4j database with nodes that have labels "Program" and "Session". In the Neo4j database I've enforced a uniqueness constraint on the properties: "name" and "href". From the :schema

Constraints
ON (program:Program) ASSERT program.href IS UNIQUE
ON (program:Program) ASSERT program.name IS UNIQUE
ON (session:Session) ASSERT session.name IS UNIQUE
ON (session:Session) ASSERT session.href IS UNIQUE

I want to periodically query another API (thus storing the name and API endpoint href as properties), and only add new nodes when they're not already in the database.

This is how I'm creating the nodes:

newprogram, = graph_db.create(node(name = programname, href = programhref))
newprogram.add_labels('Program')

newsession, = graph_db.create(node(name = sessionname, href = sessionhref))
newsession.add_labels('Session')

I'm running into the following error:

Traceback (most recent call last):
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/Users/jedc/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/Users/jedc/appfolder/applicationapis.py", line 42, in post
    newprogram.add_labels('Program')
  File "/Users/jedc/appfolder/py2neo/util.py", line 99, in f_
    return f(*args, **kwargs)
  File "/Users/jedc/appfolder/py2neo/core.py", line 1638, in add_labels
    if err.response.status_code == BAD_REQUEST and err.cause.exception == 'ConstraintViolationException':
AttributeError: 'ConstraintViolationException' object has no attribute 'exception'

My thought was that if I try to add the nodes and they're already in the database they just won't be added.

I've done a try/except AttributeError block around the creation/add_labels lines, but when I did that I managed to duplicate everything that was already in the database, even though I had the constraints shown. (?!?) (How can py2neo manage to violate those constraints??)

I'm really confused, and would appreciate any help in figuring out how to add nodes only when they don't already exist.


Solution

  • The problem seems to be that you are first creating nodes without a label and then subsequently adding the label after creation.

    That is

    graph_db.create(node(name = programname, href = programhref))
    

    and

    graph_db.create(node(name = sessionname, href = sessionhref))
    

    This, first creates nodes without any labels which means the nodes satisfy the constraint conditions which only apply to nodes with the labels Program and Session.

    Once you call newprogram.add_labels('Program') and newsession.add_labels('Session') Neo4j attempts to add labels to the node and raises an exception since the constraint assertions cannot be met.

    Py2neo may be creating duplicate nodes. Although I'm sure if you inspect them, you'll find one set of nodes has the labels and the other set does not.

    Can you use py2neo in a way that it adds the label at the same time as creation?

    Otherwise you could use a Cypher query

    CREATE (program:Program{name: {programname}, href: {programhref}})
    CREATE (session:Session{name: {sessionname}, href: {sessionhref}})
    

    Using Py2neo you should be able to do this as suggested in the docs

    graph_db = neo4j.GraphDatabaseService()
    qs = '''CREATE (program:Program{name: {programname}, href: {programhref}})
            CREATE (session:Session{name: {sessionname}, href: {sessionhref}})'''
    query = neo4j.CypherQuery(graph_db, qs)
    query.execute(programname=programname, programhref=programhref,
                  sessionname=sessionname, sessionhref=sessionhref)