Search code examples
sql-servermergeentity-framework-coreef-core-8.0

Optimized INSERT of CHILD with MERGE of PARENT


I have the following tables: [ParentTable] (1..m) [ChildTable]. Both have [Id] INT IDENTITY (1, 1).

I would like to determine the optimal way of inserting records in the [ChildTable] with Entity Framework, without fetching the [ParentTable] in memory. More: there many not be a corresponding record in the [ParentTable] at the time of the insert into [ChildTable]. Ideally, we would do a MERGE for the [ParentTable] and an INSERT into the [ChildTable].

With T-SQL this will look like this:

BEGIN TRANSACTION

DECLARE @InsertedIDs TABLE (ID INT);

MERGE TOP (1) INTO dbo.ParentTable as Target
USING 
(
  VALUES ('00000000-9F5D-4D3C-9EDC-65BCBEA2D88F', 'Some Description')
) AS Source (TraceId, Description) 
ON Target.TraceId = Source.TraceId
WHEN NOT MATCHED BY Target THEN
    INSERT (TraceId, Description, SomeDateTimeUtc) VALUES (Source.TraceId, Source.Description, GETUTCDATE())
WHEN MATCHED THEN
    UPDATE SET Target.TraceId = Source.TraceId
        -- this is the match condition, nothing will be changed by the UPDATE
        -- I only do this to bring the record into the 'inserted' special table so I can output the ID
OUTPUT inserted.ID INTO @InsertedIDs; 

INSERT INTO [dbo].[ChildTable] (ParentTableId, SomeMessage, StatusDateTimeUtc)
-- Assumption: only one row will be MERGED into ParentTable at a time
-- So, we will always have only one record in the @InsertedIDs
VALUES ((select top 1 ID from @InsertedIDs), 'Some Message', GETUTCDATE())

select * from ParentTable
select * from ChildTable

ROLLBACK
--COMMIT

The above SQL is performing the requested operation in only one trip to the database. I would like to get the same result with Entity Framework.

Constrains:

  1. I am aware of the context.Database.ExecuteSqlCommand(sql, parameters); method. I would prefer to avoid hardcoded SQL in the code.
  2. I am familiar with commercial libraries that may help with this (like Entity Framework Extensions). These extensions are great and I may resort to them.

Are there any other solutions?

I have created a small PoC in GitHub as a quick-start for the code: https://github.com/mpetcov/MergeAndInsert-poc.

I added benchmarks in my code, including one for the linq2db solution provided by @svyatoslav-danyliv (Thank you!): enter image description here

It is interesting the last 2 benchmarks I ran are showing the EFCore implementation to be faster. I was expecting the Inline SQL one to be faster and the first series of benchmarks were showing it to be slightly faster. Any thoughts?


Solution

  • I would suggest another extension linq2db.EntityFrameworkCore, which support Merge out of the box. Disclaimer, I'm one of the creators.

    With this extensions you can write the following LINQ query, unfortunately it needs two database requests:

    var source = new []
    {
        new ParentTable {TraceId = new Guid("00000000-9F5D-4D3C-9EDC-65BCBEA2D88F"), Description = "Some Description"},
    };
    
    using var transaction = context.Database.BeginTransaction();
    
    var insertedIds = context.ParentTables
        .ToLinqToDB()
        .Merge()
        .Using(source)
        .On((t, s) => t.TraceId == s.TraceId)
        .InsertWhenNotMatched(s => new ParentTable
        {
            TraceId = s.TraceId,
            Description = s.Description,
            SomeDateTimeUtc = Sql.CurrentTimestampUtc
        })
        .UpdateWhenMatched((t, s) => new ParentTable
        {
            TraceId = t.TraceId,
        })
        .MergeWithOutput((a, deleted, inserted) => inserted.Id)
        .ToList();
    
    context.ChildTables.ToLinqToDBTable().Insert(() => new ChildTable
    {
        ParentTableId = insertedIds[0],
        SomeMessage = "SomeMessage",
        StatusDateTimeUtc = Sql.CurrentTimestampUtc
    });
    
    transaction.Commit();
    

    Extension should generate the following SQL:

    MERGE INTO [ParentTable] [Target]
    USING (VALUES
        ('00000000-9f5d-4d3c-9edc-65bcbea2d88f',N'Some Description')
    ) [Source]
    (
        [TraceId],
        [Description]
    )
    ON ([Target].[TraceId] = [Source].[TraceId])
    
    WHEN NOT MATCHED THEN
    INSERT
    (
        [TraceId],
        [Description],
        [SomeDateTimeUtc]
    )
    VALUES
    (
        [Source].[TraceId],
        [Source].[Description],
        SYSUTCDATETIME()
    )
    
    WHEN MATCHED THEN
    UPDATE
    SET
        [Target].[TraceId] = [Target].[TraceId]
    OUTPUT
        [INSERTED].[Id]
    ;
    
    INSERT INTO [ChildTable]
    (
        [ParentTableId],
        [SomeMessage],
        [StatusDateTimeUtc]
    )
    VALUES
    (
        @ParentTableId,
        N'SomeMessage',
        SYSUTCDATETIME()
    )
    

    As I mentioned above, two requets are needed. I don't know any extensions may help here to do that in one request.