Search code examples
.netgraphneo4jgraph-databasesneo4jclient

Edge creation too slow when using Neo4j .Net client


I've got around 800k nodes and I'm trying to insert around 8 million edges into Neo4j enterprise edition using Neo4j .Net client.

I'm doing something like this and this process is really slow. Tried Neo4j driver but that's slow too. I've also got the name field indexed. Could anybody please suggest an alternative method for faster creation of edges?

public static async Task AddEdges( List<Edge> edgeTable, IGraphClient client)
{

      try
        {
            foreach (var item in edgeTable)
            {
                await client.Cypher
                .Match("(parentNode:MyNodeType)", "(childNode:MyNodeType)")
                .Where((MyNodeType parentNode, MyNodeType childNode) => parentNode.Name == item.SourceName && childNode.Name==item.MemberName)
                .Create("(childNode)-[:belongsTo]->(parentNode)")
                .ExecuteWithoutResultsAsync();
            }

        }
        catch (Exception ex)
        {
            //ex handling
        }


 }

Solution

  • You are using await to turn each async request (due to your use of ExecuteWithoutResultsAsync) into a synchronous request. That means that the current http request has to respond before the next one can be sent, even though you don't care about the responses. (By the way, to achieve the same thing, you could have just used ExecuteWithoutResults without await.) This kind of serialization should be avoided when possible. But, given your use case, the possibility of deadlocks does exist with parallel execution (due to write locking of the end nodes when creating new relationships).

    Also, you are only performing a single CREATE operation in each transactional request. This means you are making 8 million serialized transactional requests. Naturally, that is going to be slow.

    One solution that allows for a degree of asynchronous operation while also avoiding deadlock would involve analyzing your data (ideally, programmatically) to come up with groups of edges that do not have overlapping nodes with other groups. Even though the edges within a group still have to be processed synchronously, different groups can be processed in parallel without deadlocks.

    If you can perform your operations on the N edges within one group in a single transaction, then you avoid the overhead of making N synchronous transactional requests for that group, and the deadlocks mentioned above are avoided as well.

    Using the UNWIND clause, you can iterate operations over the data from a list in a single request. Something like the following should work. Note that the edgeTable input list must contain the edges from a single group, as discussed above:

    public static async Task AddEdges( List<Edge> edgeTable, IGraphClient client) {
        try {
            client.Cypher
              .Unwind(edgeTable, "item")
              .Match("(parentNode:MyNodeType)", "(childNode:MyNodeType)")
              .Where((MyNodeType parentNode, MyNodeType childNode) => parentNode.Name == item.SourceName && childNode.Name==item.MemberName)
              .Create("(childNode)-[:belongsTo]->(parentNode)")
              .ExecuteWithoutResultsAsync();
        } catch (Exception ex) {
            //ex handling
        }
    }
    

    Notice that I am using ExecuteWithoutResultsAsync without await, so that the groups are processed asynchronously.

    One caveat, though, is that you do not want to ask the neo4j server to process too much at once, lest it runs out of memory. So, if any group is too large, or if too many groups are processing at the same time, you may want to throttle the rate at which you call AddEdges, and/or split up large groups into smaller chunks and make sure those chunks are processed syncrhonously with respect to each other.