Search code examples
sql-serveropenxmlopenrowsetopen-json

Is there a way to reserve generated ID on a temporary table


I posted this question

INSERT Statement Expensive Queries on Activity Monitor

As you will see the XML structure has different levels.

I have created different tables

Organisation = organisation_id (PRIMARY_KEY)
Contacts = organisation_id (FOREIGN_KEY)
Roles = organisation_id (FOREIGN_KEY)
Rels = organisation_id (FOREIGN_KEY)
Succs = organisation_id (FOREIGN_KEY)

What I want is to generate the organisation_id and do the insert on each table in cascading manner. At the moment the process takes almost 2 hours for 300k. I have 3 approach

  1. Convert XML to List Object and Send by batch(1000) as JSON text and send to a stored procedure the uses OPENJSON

  2. Convert XML to list object and send by batch (1000) and save the batch as JSON a file that SQL Server can read and pass the filepath on a stored procedure which then opens the JSON file using OPENROWSET and OPENJSON

  3. Send the path to XML to a stored procedure then use OPENROWSET and OPENXML.

All process (1-3) inserts the data into a FLAT temp table then iterate each row to call different INSERT stored procedure for each tables. Approach #3 seems to fail with errors on 300k but works on 4 records.

The other question is, will it be much faster if I use an physical table than a temp table?

-------UPDATE------- As explained on the link, I was doing while loop. Someone suggested / commented to do a batch insert on each of the table. The problem is, for example, Contacts I can only do this if I know the organisation_id

    select
        organisation_id = IDENTITY( bigint ) -- IF I CAN GENERATE THE ACTUAL ORGANISATION ID
        ,name = Col.value('.','nvarchar(20)')           
        ,contact_type = c.value('(./@type)[1]','nvarchar(50)')
        ,contact_value= c.value('(./@value)[1]','nvarchar(50)')
    into
        #temporganisations
    from
        @xml.nodes('ns1:OrgRefData/Organisations/Organisation') as Orgs(Col)
     outer apply Orgs.Col.nodes('Contacts/Contact') as Cs(c)

Then when I do the batch insert

insert into contacts
    (
      organisation_id,type,value
    )
select
   torg.organisation_id -- if this is the actual id then perfect
   ,torg.type
   ,torg.value
from #temporg torg

Solution

  • I would suggest that you shred the XML client-side, and switch over to doing some kind of Bulk Copy, this will generally perform much better.

    At the moment, you cannot do a normal bcp or SqlBulkCopy, because you also need the foreign key. You need a way to uniquely identify Organisation within the batch, and you say that is difficult owing to the number of columns needed for that.

    Instead, you need to generate some kind of unique ID client-side, an incrementing integer will do. You then assign this ID to the child objects as you are shredding the XML into Datatables / IEnumerables / CSV files.

    You have two options:

    • The easiest in many respects, is to not use IDENTITY from OrganisationId and just directly insert your generated ID. This means you can leverage standard SqlBulkCopy procedures.

    The downside is that you lose the benefit of automatic IDENTITY assignment, but you could instead just use the SqlBulkCopyOptions.KeepIdentity option which only applies to this insert, and carry on with IDENTITY for other inserts. You would need to estimate a correct batch of IDs that won't clash.

    A variation on this is to use GUIDs, these are always unique. I don't really recommend this option.


    • If you don't want to do this, then it gets quite a bit more complex.

    You need to define equivalent Table Types for each of the tables. Each has a column for the temporary primary key of the Organisation

    CREATE TYPE OrganisationType AS TABLE
        (TempOrganisationID int PRIMARY KEY,
         SomeData varchar...
    

    Pass through the shredded XML as Table-Valued Parameters. You would have @Organisations, @Contacts etc.

    Then you would have SQL along the following lines:

    -- This stores the real IDs
    DECLARE @OrganisationIDs TABLE
        (TempOrganisationID int PRIMARY KEY, OrganisationId int NOT NULL);
    
    -- We need a hack to get OUTPUT to work with non-inserted columns, so we use a weird MERGE
    MERGE INTO Organisation t
    USING @Organisations s
      ON 1 = 0   -- never match
    WHEN NOT MATCHED THEN
      INSERT (SomeData, ...)
      VALUES (s.SomeData, ...)
    OUTPUT
        s.TempOrganisationID, inserted.OrganisationID
    INTO @OrganisationIDs
        (TempOrganisationID, OrganisationID);
    
    -- We now have each TempOrganisationID matched up with a real OrganisationID
    -- Now we can insert the child tables
    
    INSERT Contact
        (OrganisationID, [Type], [Value]...)
    SELECT o.OrganisationID, c.[Type], c.[Value]
    FROM @Contact c
    JOIN @OrganisationIDs o ON o.TempOrganisationID = c.TempOrganisationID;
    
    -- and so on for all the child tables
    
    • Instead of saving the IDs to a table variable, you could instead stream back the OUTPUT to client, and have the client join the IDs to the child tables, then BulkCopy them back again as part of the child tables.
      This makes the SQL simpler, however you still need the MERGE, and you risk complicating the client code significantly.