I am working on a project which is receiving information from GitHub API and analyze them. The problem is when ı try to insert this kind of huge data (for example in files I insert a list of lists as seen below which has 19k item in it) it takes a very long time. How can I improve that? Thanks.
for i in commits_array:
for j in i[-1]:
self.insert_value_to_files_DB(j)
def insert_value_to_files_DB(self, array):
try:
sql = "INSERT INTO files (file_count,file_sha,file_name,file_status,file_additions,file_deletions,file_changes,commit_sha) VALUES (%s, %s, %s, %s,%s, %s, %s,%s)"
self.cursor.execute(sql, array)
self.connection.commit()
except mysql.connector.Error as error:
print("Failed to insert into MySQL table {}".format(erro
19,000 rows isn't huge for a dbms. It's small.
Before you start inserting rows, do
self.connection.start_transaction()
Then after every hundred rows or so do
self.connection.commit()
self.connection.start_transaction()
Then, when you're done do a final commit().
On inserts, it's the commit operations that take time. Putting multiple rows into one commit makes things faster.