Correct way to bulk insert / merge nodes and edges

I've been using neo4j with py2neo for a couple of weeks now, and up to now it was fine to just do single node transactions, so I would have different node types

class NodeA(GraphObject):
  ...

class NodeB(GraphObject):
  ...

# create some nodes from data and simply save them one by one
for data in dataset:
  node_a = NodeA(data)
  node_b = NodeB(data)

  if x:
    node_a.related_to_b.add(node_b)

  g.merge(node_b)
  g.merge(node_a)

Nothing fancy. However, I'm starting to get more nodes and connections, and single transactions don't really work anymore, as expected. I've been looking for ways to do bulk inserts, but can't find any good ressources. The best I've managed to accomplish is using unwind_merge_nodes_query, which has two issues:

isn't that fast (~5 seconds for 700 very basic nodes on my laptop)
edges need to be handled separately
it requires keeping track of all the node ids to be able to handle edge connections

I've been writing functions to handle the above mentioned points, but I feel like I'm missing something and that there's a simpler way to handle batches of data

Solution

The unwind_merge_nodes_query function isn't generally intended to be used directly, although you can do so. Usually, you'd want to use the functions from the py2neo.bulk module instead, which wrap these functions.

Either way though, that nuance is unlikely to help much with your specific problems. As a client-side library, py2neo can only carry out operations exposed by the Neo4j server and, unfortunately, there exists no good (low level) way to import non-trivial bulk data from the client. Py2neo can't fix that.

If performance is your goal, your best bet might be to instead use a LOAD CSV Cypher statement. Note though that to do this, your input data file will need to be on our visible to the server directly.