I'm using below extension function to Upsert data with EF Core with EFCore.BulkExtensions
, but the issue is the execution for this function when I tried to insert 2 millions record is taking around 17 minutes and lately it throws this exception
Could not allocate space for object 'dbo.SORT temporary run storage: 140737501921280' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.\r\nThe transaction log for database 'tempdb' is full due to 'ACTIVE_TRANSACTION' and the holdup lsn is (41:136:347)
I can see the storage for "C" partition is decreasing when I executed this function:
When I restart SQL Server, the free space becomes around 30 GB, I tried to use multi threading (parallel) with the insert with no noticed time change, so what do you recommend or is there any issue in the code shown here.
Note: the for loop is not taking too much time even if it was 2 millions records.
public static async Task<OperationResultDto> AddOrUpdateBulkByTransactionAsync<TEntity>(this DbContext _myDatabaseContext, List<TEntity> data) where TEntity : class
{
using (var transaction = await _myDatabaseContext.Database.BeginTransactionAsync())
{
try
{
_myDatabaseContext.Database.SetCommandTimeout(0);
var currentTime = DateTime.Now;
// Disable change tracking
_myDatabaseContext.ChangeTracker.AutoDetectChangesEnabled = false;
// Set CreatedDate and UpdatedDate for each entity
foreach (var entity in data)
{
var createdDateProperty = entity.GetType().GetProperty("CreatedDate");
if (createdDateProperty != null && (createdDateProperty.GetValue(entity) == null || createdDateProperty.GetValue(entity).Equals(DateTime.MinValue)))
{
// Set CreatedDate only if it's not already set
createdDateProperty.SetValue(entity, currentTime);
}
var updatedDateProperty = entity.GetType().GetProperty("UpdatedDate");
if (updatedDateProperty != null)
{
updatedDateProperty.SetValue(entity, currentTime);
}
}
// Bulk insert or update
var updateByProperties = GetUpdateByProperties<TEntity>();
var bulkConfig = new BulkConfig()
{
UpdateByProperties = updateByProperties,
CalculateStats = true,
SetOutputIdentity = false
};
// Batch size for processing
int batchSize = 50000;
for (int i = 0; i < data.Count; i += batchSize)
{
var batch = data.Skip(i).Take(batchSize).ToList();
await _myDatabaseContext.BulkInsertOrUpdateAsync(batch, bulkConfig);
}
// Commit the transaction if everything succeeds
await transaction.CommitAsync();
return new OperationResultDto
{
OperationResult = bulkConfig.StatsInfo
};
}
catch (Exception ex)
{
// Handle exceptions and roll back the transaction if something goes wrong
transaction.Rollback();
return new OperationResultDto
{
Error = new ErrorDto
{
Details = ex.Message + ex.InnerException?.Message
}
};
}
finally
{
// Re-enable change tracking
_myDatabaseContext.ChangeTracker.AutoDetectChangesEnabled = true;
}
}
}
It looks like the manual batching is incurring much more startup cost. Also, you are using CalculateStats = true
, which means each run is going to pointlessly recalculate stats.
SqlBulkCopy
, which is what BulkExtensions uses under the hood, has batching built-in, and you can use that in the bulk config. You don't need to batch it yourself. And then it will calculate the stats at the end (or you can turn that off).
Furthermore, the explicit transaction means that the database transaction log, as well as any worktable and row-versioning in the tempdb
database, cannot be cleared until it is committed. You may want a clean rollback, but it does come at a cost. And if you really did need it, just set the batch size in the config to 0 and you get one big internal transaction.
So all you need is:
var bulkConfig = new BulkConfig()
{
UpdateByProperties = updateByProperties,
CalculateStats = true,
SetOutputIdentity = false,
BatchSize = 50000
};
await _myDatabaseContext.BulkInsertOrUpdateAsync(data, bulkConfig);
Also consider using BulkInsertAsync
rather than BulkInsertOrUpdateAsync
as that can insert directly to the table, rather than going via a temp table and MERGE
.