Search code examples
memgraphdb

How to efficiently load ~500K nodes and 200K relationships on the fly with automation?


We need to load approximately 500,000 nodes and 200,000 relationships into our system dynamically and automate the process for reproducibility. Our infrastructure relies heavily on NATs, and we're hoping to integrate this process without resorting to Kafka as a workaround for this specific task.

According to the documentation, using CSVs is the most efficient method for this kind of data import, but it doesn't support direct file transmission. I've attempted to batch the process in parallel using Go, but the performance has been unsatisfactory due to slow processing times.

Has anyone encountered a similar issue and found an effective solution or workaround?


Solution

  • I would recommend single transaction and a single query with batch parameters, something similar to this in Python:

    query = """
        WITH $batch AS nodes
        UNWIND nodes AS node
        CREATE (n:Node {id:node.id})
        """
    
    cursor.execute(query, {"batch": create_list})
    conn.commit()
    

    You can do chunks of 10k, 20k or 50k, but keeping it single transaction should be the fastest.