Search code examples
c#sql-serverentity-frameworkmany-to-manyentity-framework-6

Many-to-many inserts with Entity Framework and SQL Server


In our database we have this parent - child - grandchild relation that is many-to-many relationship ( twice ). This happens through two junction / cross-reference tables. Parent/Child/Grandschild tables have varchar functional keys that are unique. Below is a simplified version showing only the first step in the hierarachy:

Parent              Junction             Child
+----+-------+      +------+------+      +----+-------+
| PK | F_KEY |      | PK_1 | PK_2 |      | PK | F_KEY |
+----+-------+      +------+------+      +----+-------+
|  1 | AAA   |      |    1 |    1 |      |  1 | BBB   |
+----+-------+      +------+------+      +----+-------+

The number of records in both parent / child / grandchild are several millions.

Situation

We need to deal with the situation where we're given a collection of parent-child-grandchild and some of them may already be present in the database. We need to insert the ones that are not yet present, ignore rest ( based on functional key ).

So the current implementation:

  1. switches off autodetectChanges and disables all constraints on the datacontext.
  2. checks for parents already present ( using F_KEY ) - inserts non existing ones
  3. checks for children already present ( F_KEY ) - inserts non existing ones and I think manually updates EF
  4. idem for grandchildren

Not surprisingly - something went wrong and now we have missing links in our junction table and we're having to fix this through scripts.

This implementation doesn't sit well with me. Argument of the dev was performance. Original implementation did not perform:

  1. Given list of parents - ignore existing ones
  2. Look at remaining children - replace existing ones with DbEntries
  3. Idem for grandchildren
  4. SaveChanges()

Didn't perform. My colleague said - 'think about it: you have to enter parents, then retrieve the id's. Save children, retrieve id's, use these for first junction table etc.'

Question

How can I make this perform? I mean - it works, but not very maintainable and really rubs me the wrong way.

An idea I had - if we make the junction table contain the unique functional keys like so:

Parent              Junction             Child
+----+-------+      +------+------+      +----+-------+
| PK | F_KEY |      | PK_1 | PK_2 |      | PK | F_KEY |
+----+-------+      +------+------+      +----+-------+
|  1 | AAA   |      |  AAA |  BBB |      |  1 | BBB   |
+----+-------+      +------+------+      +----+-------+

Then we don't have to retrieve the ids of the inserted items to store them in the junction table. Does that make sense? Will EF be able to benefit from that?

If that doesn't work - and we're not using EF in the way it's at its best - we might as well consider using stored procedures or direct queries to the database. You save the overhead of EF altogether and at least then you're in full control of what we're doing and not have EF make the queries for us behind the scenes.

What are the thoughts on that? Any other suggestions are very welcome as well of course.


Solution

  • For this kind of task I would make a stored procedure that accepts few table-valued parameters https://msdn.microsoft.com/en-us/library/bb510489.aspx https://msdn.microsoft.com/en-us/library/bb675163(v=vs.110).aspx with the list of new Parents, Children, Junctions, GrandChildren, Junctions and perform all merging on the server inside one transaction without transmitting anything back to the client.

    A bunch of MERGE T-SQL statements processing rows in bulk worked quite well for me in similar cases.

    Merge Parents, then Children, then GrandChildren tables. Then Junction between Parents and Children. Then Junction between Children and GrandChildren.

    As long as the size of collection that you need to merge is reasonable (say, around 10K rows) it would work very well with a single call to the stored procedure. If you have to merge significantly more rows, consider splitting them in smaller batches and calling your stored procedure several times.