I have to import about 30k rows from a CSV file to my SQL database, this sadly takes 20 minutes.
Troubleshooting with a profiler shows me that DbSet.Add is taking the most time, but why?
I have these Entity Framework Code-First classes:
public class Article
{
// About 20 properties, each property doesn't store excessive amounts of data
}
public class Database : DbContext
{
public DbSet<Article> Articles { get; set; }
}
For each item in my for loop I do:
db.Articles.Add(article);
Outside the for loop I do:
db.SaveChanges();
It's connected with my local SQLExpress server, but I guess there isn't anything written till SaveChanges is being called so I guess the server won't be the problem....
Each item in a unit-of-work has overhead, as it must check (and update) the identity manager, add to various collections, etc.
The first thing I would try is batching into, say, groups of 500 (change that number to suit), starting with a fresh (new) object-context each time - as otherwise you can reasonably expect telescoping performance. Breaking it into batches also prevents a megalithic transaction bringing everything to a stop.
Beyond that; SqlBulkCopy. It is designed for large imports with minimal overhead. It isn't EF though.