Anyone experienced parsing and importing data into Neo4j using py2neo
and Python? I'm currently trying to parse an relatively large (18700r x 17c) .csv file and store its created nodes and relations into Neo. By using py2neo, one must first create a model inheriting from py2neo.data.Node and then use
for n in nodes:
tx = graph.begin()
tx.create(node)
for r in relations:
tx = graph.begin()
tx.create(r)
to store all data. To parse the data and store it takes roughly about 2.5 min (real time) when running with time python ...
, where its about half-half of time taking for parse and store.
Another way is to create a big query string, which I manage to do. When this is done one can run graph.run(big_query_string)
to do the same job. Now it takes about 3 seconds to parse and 2.5 min to store. When I run the same query string directly in the browser it took over 3 minutes.
We are 2 people on the same project. Me on Neo4j and another on DGraph. It's in its core the same parsing code, but to store on DGraph takes at most 5 seconds...
Anyone having experiences on this?
UPDATE There are exactly 115139 "CREATE" statements in the query.
Py2neo is not optimised for large imports such as this. You are better off using one of the dedicated import tools for Neo4j instead.