Search code examples
pythonneo4jneo4j-batch-inserterlarge-data

Alternative to batch importer for neo4j for large datasets


I am trying to import a large dataset to neo4j. I created a Python script to write the cypher queries into a .cql file after reading a .xls file and then I ran them using the neo4j-shell. This worked for a small dataset. But on increasing the size of dataset my system crashed for the same.

I have seen few suggestions to use batch importers but they are usually based in Java (eg:Groovy) and it's something I'm not comfortable using. So is there any alternative to batch inserting or at least batch inserting via Python?


Solution

  • You could try the Neo4J Load CSV tool / cypher command. It is very flexible and can be used with the USING PERIODIC COMMIT to process very large datasets by making periodic commits to prevent buffer overflow problems and optimize the process further.

    The only prerequisite is that you are able to export your original data in the CSV format.

    http://neo4j.com/developer/guide-import-csv/

    http://neo4j.com/docs/developer-manual/current/#cypher-query-lang (section 8.6)