Search code examples
xmlxsdxsd-validationxml-validationxmllint

Validating less than (<) and greater than (>) in XML via XSD?


I have this XML:

<?xml version="1.0" encoding="utf-8"?>
<data>
  <A>2&gt;1</A>
  <B>0&lt;1</B>
</data>

and I want to validate it with this XSD:

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="data">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="A">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:pattern value="[^&lt;&gt;]+" />
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="B">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:pattern value="[^&lt;&gt;]+" />
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>

I use xmllint to validate and then I got this error:

Schemas validity error: Element 'A': [facet 'pattern'] The Value '2>1' is not accepted by the pattern '[^<>]'.

Schemas validity error: Element 'B': [facet 'pattern'] The Value '0<1' is not accepted by the pattern '[^<>]'

As everyone can see, I don't have < or > in element A and B. I have 2&gt;1 and 0&lt;1. Before xmllint validates the xml, it transfers 2&gt;1 to 2>1 and 0&lt;1 to 0<1, and then validates and reports they are not correct!

How can I let the string "2&gt;1" and "0&lt;1" pass a pattern, which requests that the string should not have less than or greater than symbol?


Solution

  • This updated XSD,

    <xs:schema attributeFormDefault="unqualified"
               elementFormDefault="qualified"
               xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="data">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="A" type="ltGtExpType"/>
            <xs:element name="B" type="ltGtExpType"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:simpleType name="ltGtExpType">
        <xs:restriction base="xs:string">
          <xs:pattern value="\d+[&lt;&gt;]\d+" />
        </xs:restriction>
      </xs:simpleType>
    </xs:schema>
    

    will validate your XML,

    <?xml version="1.0" encoding="utf-8"?>
    <data>
      <A>2&gt;1</A>
      <B>0&lt;1</B>
    </data>
    

    successfully.


    That said, you appear to be trying to distinguish between < and &lt; (and between > and &gt;) at the XSD level. That is not possible or needed for any reasonable requirements anyway. An XML parser will make those replacements prior to validation. Furthermore, an XML parser will already, as part of its well-formedness checking, have issued an error regarding any literal < symbols encountered that are not a part of a start tag.