Search code examples
sql-serverxmlxml-parsingopenxmlsqlxml

Importing and parsing a large XML file in SQL Server (when "normal" methods are rather slow)


I have a large XML file that I need to import and parse into tabular structure ("flatten") in SQL Server. By "large" I mean a file that is around 450 MB, contains up to 6-7 nested levels and lots of elements, ~300.

I tried parsing the file both with OPENXML and Xml.Nodes. Both of the methods are slow. A partial query which reads a parent element and it's nested grandchildren takes several minutes if not dozens to run.

I tried using the SQLXML Bulk Load method. Unfortunately I couldn't - because the file isn't structured properly. There is an element which is logically a parent element which isn't nested as a parent physically.

Do you think the only posiblle solution left is to use .NET or Java? Is there something I'm missing?

I would strongly prefer a dynamic solution, to some degree. I don't want the SQL Server developers to relay on a procedural, compiled, code that they have no control/knowledge about - in the event that some changes will occur (in the XML structure).

Thank you very much.


Solution

  • OK. I created an XML Index on the XML data column. (Just a primary one for now).

    A query that took ~4:30 minutes before takes ~9 seconds now! Seems that a table that stores the XML with a proper XML Index and the parsing the data with the xml.nodes() function are a feasible solution.

    Thank you all.