Assume a system with multiple concurrent producers that each strives to persist some graph of objects with the following common entities uniquely identifiable by their names:
CREATE TABLE CommonEntityGroup(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(100) NOT NULL
);
GO
CREATE UNIQUE INDEX IX_CommonEntityGroup_Name
ON CommonEntityGroup(Name)
GO
CREATE TABLE CommonEntity(
Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(100) NOT NULL,
CommonEntityGroupId INT NOT NULL,
CONSTRAINT FK_CommonEntity_CommonEntityGroup FOREIGN KEY(CommonEntityGroupId)
REFERENCES CommonEntityGroup(Id)
);
GO
CREATE UNIQUE INDEX IX_CommonEntity_CommonEntityGroupId_Name
ON CommonEntity(CommonEntityGroupId, Name)
GO
For example, producer A saves some CommonEntityMeeting
s, while producer B saves CommonEntitySet
s. Either of them has to persist CommonEntity
s related to their particular items.
Basically, the key points are:
Name
and related common entity if any(e.g. CommonEntity
is unique by CommonEntity.Name
+CommonEntityGroup.Name
)).CommonEntities
- they usually just pass DTOs with Names
(unique) of those CommonEntities
and related information. So any Common(Group)Entity
has to be found/created by Name
.So, with Entity Framework(database first, though it probably doesn't matter) as DAL and SQL Server as storage what is an efficient and reliable way to ensure that all those producers will successfully persist their intersecting object graphs at the same time?
Taking into account that UNIQUE INDEX
already ensures that there won't be duplicate CommonEntities
(Name, GroupName pair is unique) I can see the following solutions:
In such a case when SaveChanges
is called for related entities there won't be any index violations due to other producers creating the same entities a moment before.
To achieve it I will have some
public class CommonEntityGroupRepository // sort of
{
public CommonEntityGroupRepository(EntitiesDbContext db) ...
// CommonEntityRepository will use this class/method internally to create parent CommonEntityGroup.
public CommonEntityGroup FindOrCreateAndSave(String groupName)
{
return
this.TryFind(groupName) ?? // db.FirstOrDefault(...)
this.CreateAndSave(groupName);
}
private CommonEntityGroup CreateAndSave(String groupName)
{
var group = this.Db.CommonEntityGroups.Create();
group.Name = groupName;
this.Db.CommonGroups.Add(group)
try
{
this.Db.SaveChanges();
return group;
}
catch (DbUpdateException dbExc)
{
// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
return this.Find(groupName); // TryFind that throws exception.
}
}
}
With this approach there will be multiple calls to SaveChanges and each CommonEntity will have its own sort of a Repository, though it seems to be the most reliable solution.
A bit ugly and inefficient (with 10 CommonEntities
we may have to retry it 10 times), but simple and more or less reliable.
Not sure that there is an easy and reliable way to replace duplicate entries in more or less complex object graphs, though both case specific and more generic reflection-based solution can be implemented.
Still, like a previous solution it may require multiple retries.
Doubt that it will be any easier to handle inside stored procedure. It will be the same optimistic or pessimistic approaches just implemented on database side.
Though it may provide better performance(not an issue in this case) and put the insertion logic into one common place.
I would probably try the first solution, but perhaps there are better alternatives or some potential pitfalls.
Table Valued Parameters
One option is to use table valued parameters
instead of individual calls to the database.
Example procedure using a table valued parameter:
create type dbo.CommonEntity_udt as table (
CommonEntityGroupId int not null
, Name nvarchar(100) not null
, primary key (CommonEntityGroupId,Name)
);
go
create procedure dbo.CommonEntity_set (
@CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
set nocount on;
set xact_abort on;
if exists (
select 1
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
))
begin;
insert dbo.CommonEntity (Name)
select s.Name
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t with (updlock, holdlock)
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
);
end;
end;
go
table valued parameter reference:
I don't recommend merge
unless there is a compelling argument for it. This situation is only looking at inserting, so it seems like overkill.
Example merge
version with table valued parameter:
create procedure dbo.CommonEntity_merge (
@CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
set nocount on;
set xact_abort on;
if exists (
select 1
from @CommonEntity as s
where not exists (
select 1
from dbo.CommonEntity as t
where s.Name = t.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId
))
begin;
merge dbo.CommonEntity with (holdlock) as t
using (select CommonEntityGroupId, Name from @CommonEntity) as s
on (t.Name = s.Name
and s.CommonEntityGroupId = t.CommonEntityGroupId)
when not matched by target
then insert (CommonEntityGroupId, Name)
values (s.CommonEntityGroupId, s.Name);
end;
end;
go
merge
reference:
MERGE
Statement - Aaron Bertrandignore_dup_key
code comment:
// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
ignore_dup_key
is going to use serializable
behind the the scenes; potentially costly overhead on non-clustered indexes; and even when the index is clustered, can have significant costs depending on the amount of duplicates.
This can be handled in the stored procedures using Sam Saffron's upsert (update/insert) pattern, or one of the patterns shown here: Performance impact of different error handling techniques - Aaron Bertrand.