Search code examples
neo4jjvmcypherneo4jphpneo4j-node

Neo4J server is stuck close to 100% CPU


Running neo4j 2.0.1 community version on an AWS EC2 instance. Neo4J server is getting stuck close to 100% CPU after some read requests.

The CPU continue to stuck close to 100% even when there are no read or write.

The ubuntu 'top' command just shows a java process consuming the CPU. How do I debug this? How do I know what neo4j is doing to keep CPU close to 100%

Update: I see below GC logs continously:

70356.833: [GC 485305K->421306K(590488K), 0.0023720 secs]
70356.873: [GC 485498K->421273K(590488K), 0.0023950 secs]
70356.917: [GC 485465K->421152K(590488K), 0.0027120 secs]
70356.961: [GC 485344K->421407K(590488K), 0.0023500 secs]
70357.004: [GC 485599K->421205K(590488K), 0.0034150 secs]
70357.049: [GC 485397K->421174K(590488K), 0.0027470 secs]
70357.097: [GC 485366K->421335K(590488K), 0.0022430 secs]
70357.140: [GC 485527K->421615K(590488K), 0.0024140 secs]
70357.189: [GC 485807K->421826K(590488K), 0.0025360 secs]
70357.237: [GC 486018K->422124K(590488K), 0.0031070 secs]
70357.285: [GC 486316K->421844K(590488K), 0.0024500 secs]
70357.325: [GC 486036K->421985K(590488K), 0.0024550 secs]
70357.365: [GC 486177K->422020K(590488K), 0.0028860 secs]
70357.411: [GC 486212K->421787K(590488K), 0.0025340 secs]
70357.457: [GC 485979K->421863K(590488K), 0.0027430 secs]
70357.505: [GC 486055K->422085K(590488K), 0.0023570 secs]
70357.553: [GC 486277K->422297K(590488K), 0.0024670 secs]
70357.601: [GC 486489K->422474K(590488K), 0.0023700 secs]

I see GC logs for very long time even though there are no queries hitting. I think GC is consuming close to 100% CPU(or something else?).

Java-neo4j thread dump when CPU is close to 100%: https://onedrive.live.com/redir?resid=49F6403CD7EC37D4!107&authkey=!AM_esZ8nS-iPRCQ&ithint=file%2clog


Solution

  • Looking at the thread dump that you have provided I can see 6 open queries running requests that have come in over the rest endpoint (or at least that is how I am interpreting the lines - at org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:83) all of which occur in RUNNABLE state) .

    Like @JimBaird says I think that you probably have some queries that you thought had run but are really hanging around in the background thrashing your machine.

    Unfortunately I do not think that you can kill a slow query, so you might need to try restarting it.