Search code examples
c#sql-serverperformanceentity-framework-coreefcore.bulkextensions

Upsert huge amount of data by EFCore.BulkExtensions


I'm using below extension function to Upsert data with EF Core with EFCore.BulkExtensions, but the issue is the execution for this function when I tried to insert 2 millions record is taking around 17 minutes and lately it throws this exception

Could not allocate space for object 'dbo.SORT temporary run storage: 140737501921280' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.\r\nThe transaction log for database 'tempdb' is full due to 'ACTIVE_TRANSACTION' and the holdup lsn is (41:136:347)

I can see the storage for "C" partition is decreasing when I executed this function:

enter image description here

When I restart SQL Server, the free space becomes around 30 GB, I tried to use multi threading (parallel) with the insert with no noticed time change, so what do you recommend or is there any issue in the code shown here.

Note: the for loop is not taking too much time even if it was 2 millions records.

public static async Task<OperationResultDto> AddOrUpdateBulkByTransactionAsync<TEntity>(this DbContext _myDatabaseContext, List<TEntity> data) where TEntity : class
{
    using (var transaction = await _myDatabaseContext.Database.BeginTransactionAsync())
    {
        try
        {
            _myDatabaseContext.Database.SetCommandTimeout(0);

            var currentTime = DateTime.Now;

            // Disable change tracking
            _myDatabaseContext.ChangeTracker.AutoDetectChangesEnabled = false;

            // Set CreatedDate and UpdatedDate for each entity
            foreach (var entity in data)
            {
                var createdDateProperty = entity.GetType().GetProperty("CreatedDate");

                if (createdDateProperty != null && (createdDateProperty.GetValue(entity) == null || createdDateProperty.GetValue(entity).Equals(DateTime.MinValue)))
                {
                    // Set CreatedDate only if it's not already set
                    createdDateProperty.SetValue(entity, currentTime);
                }

                var updatedDateProperty = entity.GetType().GetProperty("UpdatedDate");

                if (updatedDateProperty != null)
                {
                    updatedDateProperty.SetValue(entity, currentTime);
                }
            }

            // Bulk insert or update
            var updateByProperties = GetUpdateByProperties<TEntity>();

            var bulkConfig = new BulkConfig()
            {
                UpdateByProperties = updateByProperties,
                CalculateStats = true,
                SetOutputIdentity = false
            };

            // Batch size for processing
            int batchSize = 50000;

            for (int i = 0; i < data.Count; i += batchSize)
            {
                var batch = data.Skip(i).Take(batchSize).ToList();
                await _myDatabaseContext.BulkInsertOrUpdateAsync(batch, bulkConfig);
            }

            // Commit the transaction if everything succeeds
            await transaction.CommitAsync();

            return new OperationResultDto
            {
                OperationResult = bulkConfig.StatsInfo
            };
        }
        catch (Exception ex)
        {
            // Handle exceptions and roll back the transaction if something goes wrong
            transaction.Rollback();
            return new OperationResultDto
            {
                Error = new ErrorDto
                {
                    Details = ex.Message + ex.InnerException?.Message
                }
            };
        }
        finally
        {
            // Re-enable change tracking
            _myDatabaseContext.ChangeTracker.AutoDetectChangesEnabled = true;
        }
    }
}

Solution

  • It looks like the manual batching is incurring much more startup cost. Also, you are using CalculateStats = true, which means each run is going to pointlessly recalculate stats.

    SqlBulkCopy, which is what BulkExtensions uses under the hood, has batching built-in, and you can use that in the bulk config. You don't need to batch it yourself. And then it will calculate the stats at the end (or you can turn that off).

    Furthermore, the explicit transaction means that the database transaction log, as well as any worktable and row-versioning in the tempdb database, cannot be cleared until it is committed. You may want a clean rollback, but it does come at a cost. And if you really did need it, just set the batch size in the config to 0 and you get one big internal transaction.

    So all you need is:

    var bulkConfig = new BulkConfig()
    {
        UpdateByProperties = updateByProperties,
        CalculateStats = true,
        SetOutputIdentity = false,
        BatchSize = 50000
    };
    await _myDatabaseContext.BulkInsertOrUpdateAsync(data, bulkConfig);
    

    Also consider using BulkInsertAsync rather than BulkInsertOrUpdateAsync as that can insert directly to the table, rather than going via a temp table and MERGE.