I have a problem of inserting huge amount of data to SQL server.
Previously I was using Entity framework, but it was damn slow for just 100K root level records ( containing separately two distinct collections, where each one is further operating on 200K records roughly ) = roughly 500K-600K records in memory. Here I applied all optimization ( e.g AutoDetectChangesEnabled = false, and recreated and disposed the context after each batch. )
I rejected the solution, and used BulkInsert that's substantially very fast and much efficient. Was just able to insert 100K records in a minute or so.
But the main problem is getting back primary keys from newly inserted records. For this , I am thinking to write stored procedure which could operate on TVP ( i.e in memory data table holding all root level 100K records ). and there inside I would use OUTPUT INSERTED.Id in order to get all primary keys inside application).
So, how can I compare this approach ( i.e Sql Insert query inside stored procedure ) with SqlBulkCopy approach.
Any idea if somehow, I can get all primary keys back after SqlBulkCopy operation? Or something concrete regarding OUTPUT Inserted.Id would return all correct new keys in application.
PS : I don't want to create any staging table during the process. This is just an overhead.
Here's an example based on discussion in the comments / expanding on the idea mentioned here: Possible to get PrimayKey IDs back after a SQL BulkCopy?
i.e.
I've not had a chance to test this, but hopefully this will help:
//using System.Data.SqlClient;
//using System.Collections.Generic;
public DataTable CreatePersonDataTable(IEnumerable<PersonDTO> people)
{
//define the table
var table = new DataTable("People");
table.Columns.Add(new DataColumn("Name", typeof(string)));
table.Columns.Add(new DataColumn("DOB", typeof(DateTime)));
//populate it
foreach (var person in people)
{
table.Rows.Add(person.Name, person.DOB);
}
return table;
}
readonly string ConnectionString; //set this in the constructor
readonly int BulkUploadPeopleTimeoutSeconds = 600; //default; could override in constructor
public IEnumerable<long> BulkUploadPeople(IEnumerable<PersonDTO> people) //you'd want to break this up a bit; for simplicty I've bunged everything into one big method
{
var data = CreatePersonDataTable(people);
using(SqlConnection con = new SqlConnection(ConnectionString))
{
con.Open(); //keep same connection open throughout session
RunSqlNonQuery(con, "select top 0 Name, DOB into #People from People");
BulkUpload(con, data, "#People");
var results = TransferFromTempToReal(con, "#People", "People", "Name, DOB", "Id");
RunSqlNonQuery(con, "drop table #People"); //not strictly required since this would be removed when the connection's closed as it's session scoped; but best to keep things clean.
}
return results;
}
private void RunSqlNonQuery(SqlConnection con, string sql)
{
using (SqlCommand command = con.CreateCommand())
{
command.CommandText = sql;
command.ExecuteNonQuery();
}
}
private void BulkUpload(SqlConnection con, DataTable data, string targetTable)
{
using(SqlBulkCopy bulkCopy = new SqlBulkCopy(con))
{
bulkCopy.BulkCopyTimeout = 600; //define this in your config
bulkCopy.DestinationTableName = targetTable;
bulkCopy.WriteToServer(data);
}
}
private IEnumerable<long> TransferFromTempToReal(SqlConnection con, string tempTable, string realTable, string columnNames, string idColumnName)
{
using (SqlCommand command = con.CreateCommand())
{
command.CommandText = string.Format("insert into {0} output inserted.{1} select {2} from {3}", realTable, idColumnName, columnNames, tempTable);
using (SqlDataReader reader = command.ExecuteReader())
{
while(reader.Read())
{
yield return r.GetInt64(0);
}
}
}
}
Whilst in your question you've added that you don't want to use a staging table as it's an "overhead"... please try. You may find that the small overhead of creating a staging table is less than the performance gain in using this method.
Obviously it's not going to be as fast as inserting and ignoring the returned ids; but if that's your requirement, in the absence of other answers, this may be the best alternative.