Neo4j Server Out of Memory using expandConfig

Note: please see the update towards the end.

I'm running this query against a graph that contains at least 300,000 nodes and after a while the memory usage on the server suddenly spikes to about 55Gb (killing the query). Typical safe usage is about 30Gb.

MATCH (d:Doc)
    WHERE d.id = <docId>
    MATCH (d)-[:pages]->(p:Page)
    CALL apoc.path.expandConfig(p, {
      relationshipFilter: "child",
      bfs: false
    })
    YIELD path
    WITH path, [r in relationships(path) | [r.order, id(r)]] as orders
    RETURN path
    ORDER BY orders
    SKIP 0 LIMIT 40000;

I can make it run for longer before failing by reducing the batch size, but ultimately it still fails.

This is the error I get in my C# code when the server becomes unresponsive.

Connection with the server breaks due to IOException: Unexpected end of stream, unable to read expected data from the network connection Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0.

The database does recover after a while, but it's impossible for me to process all my records successfully.

The strange thing is I only get this behaviour in one of our databases. I can process the exact same set data in other Neo4j instances no problem (and no memory spike either) despite them all being provisioned the same way with the same config (they're running in Docker containers in Azure).

Can anyone suggest why? I'm using Neo4j Community 4.2.4 and APOC 4.2.0.5.

Update: As Charchit pointed out, the way we are doing the ordering is very inefficient and in fact produced 11m database hits each time we ran the query. Removing this altogether fixed the memory spike on the server. However now our paths are not returned in the correct order.

We store the order on each relationship, and need to bring the paths back using that. In the example below, we need path (a)-(b)-(c), then (a)-(b)-(d) and finally (a)-(e).

I found an open issue that sounds like it might help us in the future (not sure if it will support ordering though). In the meantime, is there a way of achieving this without the huge memory usage caused by our original implementation?

Thanks

Solution

As per the graph, present in the picture, you can create a binary string, which will be a concatenation of the order properties of the relationships on the path, and use it for sorting, like this:

MATCH (a:A)
CALL apoc.path.expandConfig(a, {
    relationshipFilter: "child",
    bfs: false,
})
YIELD path
WITH path, reduce(str = '', r in relationships(path) | str + toString(r.order) ) as orders
RETURN path
ORDER BY orders
SKIP 0 LIMIT 40000;

The graph was created by the following query:

MERGE (a:A)-[:child{order: 0}]->(b:B)-[:child{order: 0}]->(c:C)
MERGE (b)-[:child{order: 1}]->(d:D)
MERGE (a)-[:child{order:1}]->(e:E)

If you want to order using integer try this:

MATCH (a:A)
CALL apoc.path.expandConfig(a, {
    relationshipFilter: "child",
    bfs: false,
})
YIELD path
WITH path, reduce(str = '', r in relationships(path) | str + toString(r.order) ) as orders
WITH path, toInteger(orders) as orders
RETURN path
ORDER BY orders
SKIP 0 LIMIT 40000;

This query should give you the most correct output:

MATCH (a:A)
CALL apoc.path.expandConfig(a, {
    relationshipFilter: "child",
    bfs: false,
})
YIELD path
WITH path, reduce(str = '', r in relationships(path) | str + toString(r.order) ) as ordersAsString
WITH path, toInteger(ordersAsString) as ordersAsInt
RETURN path
ORDER BY ordersAsInt, ordersAsString
SKIP 0 LIMIT 40000;