Search code examples
sql-serverxmlxsd

XSD to SQL Server table


I have a few XSD files that define a report, wondering if there's a way load this into a table that I can query.

The elements XSD defines the required fields, it looks like:

  <xs:simpleType name="NHSNumberStatusIndicatorCode_Withheld_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:length value="2"/>
        <xs:enumeration value="01"/>
        <xs:enumeration value="02"/>
        <xs:enumeration value="03"/>
        <xs:enumeration value="04"/>
        <xs:enumeration value="05"/>
        <xs:enumeration value="06"/>
        <xs:enumeration value="07"/>
        <xs:enumeration value="08"/>
    </xs:restriction>
</xs:simpleType>

I need to convert it to

     Field_Name         Type                    Length        enumeration     MaxLen      MinLen
     NHSNumber...       AlphaNumeric_Type        2                 01
     NHSNumber...       AlphaNumeric_Type        2                 02
     NHSNumber...       AlphaNumeric_Type        2                 03

and so on

Some fields don't have enumeration, they might look like:

         <xs:simpleType name="CDSOrganisationIdentifier_Type">
    <xs:restriction base="ns:AlphaNumeric_Type">
        <xs:minLength value="3"/>
        <xs:maxLength value="5"/>
    </xs:restriction>
</xs:simpleType>

OR

   <xs:simpleType name="Number_Type">
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{10}"/>
    </xs:restriction>
</xs:simpleType>

This returns Null for all the fields: enter image description here


Solution

  • You can load it into an XML variable and then use various SQL Server XML methods to select the nodes of interest and extract the values. See xml Data Type Methods and Path Expressions (XQuery) for more information.

    In the following, I wrapped your XSD fragment in a "schema" element that also defined the "xs:" namespace prefix.

    DECLARE @xml XML = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:simpleType name="NHSNumberStatusIndicatorCode_Withheld_Type">
        <xs:restriction base="ns:AlphaNumeric_Type">
            <xs:length value="2"/>
            <xs:enumeration value="01"/>
            <xs:enumeration value="02"/>
            <xs:enumeration value="03"/>
            <xs:enumeration value="04"/>
            <xs:enumeration value="05"/>
            <xs:enumeration value="06"/>
            <xs:enumeration value="07"/>
            <xs:enumeration value="08"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:schema >'
    
    SELECT
        N.node.value('(../../@name)[1]', 'nvarchar(max)') AS Field_Name,
        N.node.value('(../@base)[1]', 'nvarchar(max)') AS Type,
        N.node.value('(../xs:length/@value)[1]', 'int') AS Length,
        N.node.value('@value[1]', 'nvarchar(max)') AS enumeration
    FROM @xml.nodes('//xs:enumeration') N(node)
    

    The .nodes() method selects all of the enumeration nodes from the supplied xml. The leading // selects nodes at any depth. The N(nodes) provide arbitrary table and column aliases to the selected results. The .value() method is then used to select attributes values from either the selected node, parent node, or sibling node in the case of Length. The @ prefix on a name refers to an attribute instead of a child node. Because many XPath selectors could potentially select more than one value, you often need to use [1] or (...)[1] to select just the first.

    Result:

    Field_Name Type Length enumeration
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 01
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 02
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 03
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 04
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 05
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 06
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 07
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 08

    See this db<>fiddle.

    The XSD can also be loaded into an XSD typed column of a table and accessed in a manner similar to the above by replacing @xml with the column reference.

    The above is coded for this specific enumeration type scenario. If your actual XSD contains other type definitions that don't follow this pattern, you may need to handle those cases separately.

    UPDATE: To accommodate the following additional cases from your amended post ...

      <xs:simpleType name="CDSOrganisationIdentifier_Type">
        <xs:restriction base="ns:AlphaNumeric_Type">
            <xs:minLength value="3"/>
            <xs:maxLength value="5"/>
        </xs:restriction>
      </xs:simpleType>
      <xs:simpleType name="Number_Type">
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{10}"/>
        </xs:restriction>
      </xs:simpleType>
    

    a different approach is needed. After an initial .nodes() call to get the <xs:simpleType ...> nodes, additional calls are made to access selected subordinate nodes, some of which may or may not exist. Finally, the .value() calls in the select list have been updated to pull data relative to one of the several selected nodes.

    SELECT
        S.Node.value('(@name)[1]', 'nvarchar(max)') AS Field_Name,
        R.Node.value('(@base)[1]', 'nvarchar(max)') AS Type,
        R.Node.value('(xs:length/@value)[1]', 'int') AS Length,
        E.Node.value('@value[1]', 'nvarchar(max)') AS enumeration,
        R.Node.value('(xs:maxLength/@value)[1]', 'int') AS MaxLength,
        R.Node.value('(xs:minLength/@value)[1]', 'int') AS MinLength,
        R.Node.value('(xs:pattern/@value)[1]', 'nvarchar(max)') AS Pattern
    FROM @xml.nodes('//xs:simpleType') S(Node)
    OUTER APPLY S.Node.nodes('xs:restriction') R(Node) -- Likely always one occurrence
    OUTER APPLY R.Node.nodes('xs:enumeration') E(Node) -- Zero or more occurrences
    

    An OUTER APPLY is like a LEFT JOIN to a subselect, except that no ON condition used.

    The result is:

    Field_Name Type Length enumeration MaxLength MinLength Pattern
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 01 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 02 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 03 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 04 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 05 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 06 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 07 null null null
    NHSNumberStatusIndicatorCode_Withheld_Type ns:AlphaNumeric_Type 2 08 null null null
    CDSOrganisationIdentifier_Type ns:AlphaNumeric_Type null null 5 3 null
    Number_Type xs:string null null null null [0-9]{10}

    See this updated db<>fiddle.