Search code examples
pythonmysqlneo4jpy2neolarge-data

Large dataset insertion into neo4j from MySQL


I am using python 3 in conjunction with py2neo (v 3.1.2) to insert a large amount of data from MySQL to Neo4j. The table in MySQL has about 20 million rows. I want to do the insertion without converting the MySQL data to CSV as suggested on neo4j's website.

My code looks like the following:

transaction=graph_db.begin()
sql="SELECT id FROM users"
cursor.execute(sql)
user_data=cursor.fetchall()
count=1
for row in user_data:
    user_node=Node("User",user_id=row[0])
    transaction.create(user_node)
    if count%10000==0:
        transaction.commit()
    count=count+1

The goal was to insert in batches of 10000. But the transaction breaks down after the first iteration (the first insertion of a batch of 10k). The following is the error:

raise TransactionFinished(self)
py2neo.database.TransactionFinished: <py2neo.database.BoltTransaction object at 0x104e36588>

Can someone explain what this error means and how to solve this issue?


Solution

  • I do not know python, but the problem is that you are in the cycle commit the transaction and do not open it:

    sql="SELECT id FROM users"
    cursor.execute(sql)
    user_data=cursor.fetchall()
    count=1
    for row in user_data:
        if count%10000==1:
            transaction=graph_db.begin()
        user_node=Node("User",user_id=row[0])
        transaction.create(user_node)
        if (count%10000==0) or (count==len(user_data)):
            transaction.commit()
        count=count+1