Search code examples
pythonneo4jpy2neo

Fastest way to perform bulk add/insert in Neo4j with Python?


I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.

Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.

What is the best approach to significantly improve performance whilst staying with Python?

Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?

Thanks!


Solution

  • There are several ways to do a bulk create with py2neo, each making only a single call to the server.

    1. Use the create method to build a number of nodes and relationships in a single batch.
    2. Use a cypher CREATE statement.
    3. Use the new WriteBatch class (just released this week) to manually make a batch of nodes and relationships (this is really just a manual version of 1).

    If you have some code, I'm happy to look at it and make suggestions on performance tweaks. There are also quite a few tests you may be able to get inspiration from.

    Cheers, Nige