I'm trying to do a large scale bulk insert into a sqlite database with peewee. I'm using atomic
but the performance is still terrible. I'm inserting the rows in blocks of ~ 2500 rows, and due to the SQL_MAX_VARIABLE_NUMBER I'm inserting about 200 of them at a time. Here is the code:
with helper.db.atomic():
for i in range(0,len(expression_samples),step):
gtd.GeneExpressionRead.insert_many(expression_samples[i:i+step]).execute()
And the list expression_samples
is a list of dictionaries with the appropriate fields for the GeneExpressionRead
model. I've timed this loop, and it takes anywhere from 2-8 seconds to execute. I have millions of rows to insert, and the way I have my code written now it will likely take 2 days to complete. As per this post, there are several pragmas that I have set in order to improve performance. This also didn't really change anything for me performance wise. Lastly, as per this test on the peewee github page it should be possible to insert many rows very fast (~50,000 in 0.3364 seconds) but it also seems that the author used raw sql code to get this performance. Has anyone been able to do such a high performance insert using peewee methods?
Edit: Did not realize that the test on peewee's github page was for MySQL
inserts. May or may not apply to this situation.
Mobius was trying to be helpful in the comments but there's a lot of misinformation in there.
Other information:
Those are some good settings for loading in a bunch of data. Check the sqlite docs for more info: