I posted this question
INSERT Statement Expensive Queries on Activity Monitor
As you will see the XML structure has different levels.
I have created different tables
Organisation = organisation_id (PRIMARY_KEY)
Contacts = organisation_id (FOREIGN_KEY)
Roles = organisation_id (FOREIGN_KEY)
Rels = organisation_id (FOREIGN_KEY)
Succs = organisation_id (FOREIGN_KEY)
What I want is to generate the organisation_id
and do the insert on each table in cascading manner. At the moment the process takes almost 2 hours for 300k. I have 3 approach
Convert XML to List Object and Send by batch(1000) as JSON text and send to a stored procedure the uses OPENJSON
Convert XML to list object and send by batch (1000) and save the batch as JSON a file that SQL Server can read and pass the filepath on a stored procedure which then opens the JSON file using OPENROWSET and OPENJSON
Send the path to XML to a stored procedure then use OPENROWSET and OPENXML.
All process (1-3) inserts the data into a FLAT temp table then iterate each row to call different INSERT stored procedure for each tables. Approach #3 seems to fail with errors on 300k but works on 4 records.
The other question is, will it be much faster if I use an physical table than a temp table?
-------UPDATE------- As explained on the link, I was doing while loop. Someone suggested / commented to do a batch insert on each of the table. The problem is, for example, Contacts I can only do this if I know the organisation_id
select
organisation_id = IDENTITY( bigint ) -- IF I CAN GENERATE THE ACTUAL ORGANISATION ID
,name = Col.value('.','nvarchar(20)')
,contact_type = c.value('(./@type)[1]','nvarchar(50)')
,contact_value= c.value('(./@value)[1]','nvarchar(50)')
into
#temporganisations
from
@xml.nodes('ns1:OrgRefData/Organisations/Organisation') as Orgs(Col)
outer apply Orgs.Col.nodes('Contacts/Contact') as Cs(c)
Then when I do the batch insert
insert into contacts
(
organisation_id,type,value
)
select
torg.organisation_id -- if this is the actual id then perfect
,torg.type
,torg.value
from #temporg torg
I would suggest that you shred the XML client-side, and switch over to doing some kind of Bulk Copy, this will generally perform much better.
At the moment, you cannot do a normal bcp
or SqlBulkCopy
, because you also need the foreign key. You need a way to uniquely identify Organisation
within the batch, and you say that is difficult owing to the number of columns needed for that.
Instead, you need to generate some kind of unique ID client-side, an incrementing integer will do. You then assign this ID to the child objects as you are shredding the XML into Datatables
/ IEnumerables
/ CSV files.
You have two options:
IDENTITY
from OrganisationId
and just directly insert your generated ID. This means you can leverage standard SqlBulkCopy
procedures.The downside is that you lose the benefit of automatic IDENTITY
assignment, but you could instead just use the SqlBulkCopyOptions.KeepIdentity
option which only applies to this insert, and carry on with IDENTITY
for other inserts. You would need to estimate a correct batch of IDs that won't clash.
A variation on this is to use GUIDs, these are always unique. I don't really recommend this option.
You need to define equivalent Table Types for each of the tables. Each has a column for the temporary primary key of the Organisation
CREATE TYPE OrganisationType AS TABLE
(TempOrganisationID int PRIMARY KEY,
SomeData varchar...
Pass through the shredded XML as Table-Valued Parameters. You would have @Organisations
, @Contacts
etc.
Then you would have SQL along the following lines:
-- This stores the real IDs
DECLARE @OrganisationIDs TABLE
(TempOrganisationID int PRIMARY KEY, OrganisationId int NOT NULL);
-- We need a hack to get OUTPUT to work with non-inserted columns, so we use a weird MERGE
MERGE INTO Organisation t
USING @Organisations s
ON 1 = 0 -- never match
WHEN NOT MATCHED THEN
INSERT (SomeData, ...)
VALUES (s.SomeData, ...)
OUTPUT
s.TempOrganisationID, inserted.OrganisationID
INTO @OrganisationIDs
(TempOrganisationID, OrganisationID);
-- We now have each TempOrganisationID matched up with a real OrganisationID
-- Now we can insert the child tables
INSERT Contact
(OrganisationID, [Type], [Value]...)
SELECT o.OrganisationID, c.[Type], c.[Value]
FROM @Contact c
JOIN @OrganisationIDs o ON o.TempOrganisationID = c.TempOrganisationID;
-- and so on for all the child tables
OUTPUT
to client, and have the client join the IDs to the child tables, then BulkCopy them back again as part of the child tables.MERGE
, and you risk complicating the client code significantly.