I'm tying to load a graph of several hundred million nodes using the neo4j-admin import
tool to load the data from csv. The import will run for about two hours but then crashes with the following error:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.substring(String.java:1969)
at java.util.Formatter.parse(Formatter.java:2557)
at java.util.Formatter.format(Formatter.java:2501)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$RelationshipsProblemReporter.getReportMessage(BadCollector.java:209)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$RelationshipsProblemReporter.message(BadCollector.java:195)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.processEvent(BadCollector.java:93)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$$Lambda$110/603650290.accept(Unknown Source)
at org.neo4j.concurrent.AsyncEvents.process(AsyncEvents.java:137)
at org.neo4j.concurrent.AsyncEvents.run(AsyncEvents.java:111)
at java.lang.Thread.run(Thread.java:748)
I've been trying to adjust my max and initial heap size settings in a few different ways. First I tried simply creating a HEAP_SIZE=
variable before running the command to load the data as described here and I tried setting the heap size on the JVM like this:
export JAVA_OPTS=%JAVA_OPTS% -Xms100g -Xmx100g
but whatever I setting I use when the import starts I get the same report:
Available resources:
Total machine memory: 1.48 TB
Free machine memory: 95.00 GB
Max heap memory : 26.67 GB
Processors: 48
Configured max memory: 1.30 TB
High-IO: true
As you can see, I'm building this on a large server that should have plenty of resources available. I'm assuming I'm not setting the JVM parameters correctly for Neo4j but I can't find anything online showing me the correct way to do this.
What might be causing my GC memory error and how can I resolve it? Is this something I can resolve by throwing more resources at the JVM and if so, how do I do that so the neo4j-admin import tool can use it?
RHEL 7
Neo4j CE 3.4.11
Java 1.8.0_131
The issue was resolved by increasing the maximum heap memory. The problem was I wasn't setting the heap memory allocation correctly.
It turns out there was a simple solution; it was just a matter of when I tried to set the heap memory. Initially, I had tried the command export JAVA_OPTS='-server -Xms300g -Xmx300g'
at the command line then run my bash script to call neo4j-admin import
. This was not working, neo4j-admin import
continued to use the same heap space configuration regardless.
The solution was to simple include the command to set the heap memory in the shell script that called neo4j-admin import
. My shell script ended up looking like this:
#!/bin/bash
export JAVA_OPTS='-server -Xms300g -Xmx300g'
/usr/local/neo4j-community-3.4.11/bin/neo4j-admin import \
--ignore-missing-nodes=true \
--database=mag_cs2.graphdb \
--multiline-fields=true \
--high-io=true \
This seems super obvious but it took me almost a week to realize what I needed to change. Hopefully, this saves someone else the same headache.