Search code examples
pythonneo4jgraph-databasespy2neo

Query writing performance on neo4j with py2neo


Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
    t.run(q)
    if idx % 100 == 0:
        t.commit()
        t = graph.begin(autocommit=False)
t.commit()

It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7


Solution

  • You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

    The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

    There are supporting functions that can easily create lists for you, like range().

    As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

    UNWIND range(1, 10000) as index
    CREATE (n:Node {name:'Node ' + index})
    RETURN n.name as name, id(n) as id
    

    Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.