Search code examples
pythonxmlxsdxsd-validationxml-validation

Can xs:anyURI contain square brackets in XSD?


XML Validation fails with error:

Element 'CategoryPageUrl': 'http://www.example.com/products?my_query_parameter[]=45' is not a valid value of the atomic type 'xs:anyURI'., line 29

Feed looks like this:

    <Category>
        <ExternalId>1234</ExternalId>
        <Name>Name</Name>
        <CategoryPageUrl>http://www.example.com/products?my_query_parameter[]=45</CategoryPageUrl>
    </Category>

Appropriate piece of schema looks like this:

<xs:complexType name="CategoryType">
  <xs:all>
    <xs:element name="ExternalId" type="ExternalIdType" minOccurs="0"/>
    <xs:element name="Name" type="xs:string" minOccurs="0"/>
    <xs:element name="CategoryPageUrl" type="xs:anyURI" minOccurs="0"/>
  </xs:all>
</xs:complexType>

Solution

  • No, a xs:anyURI cannot contain square brackets ([ or ]).

    Your URI itself is invalid, and not just to XSD...

    xs:anyURI follows RFC 2396, as amended by RFC 2732.

    RFC 2396 has the following productions for the query portion of URI, where you're attempting to use square brackets:

      query         = *uric
      uric          = reserved | unreserved | escaped
      reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                      "$" | ","
      unreserved    = alphanum | mark
      mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                      "(" | ")"
    
      escaped       = "%" hex hex
      hex           = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                              "a" | "b" | "c" | "d" | "e" | "f"
    
      alphanum      = alpha | digit
      alpha         = lowalpha | upalpha
    
      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"
    

    As you can see, [ and ] are not allowed there. Further, square brackets are generally considered to be unwise anywhere in a URI, per 2.4.3. Excluded US-ASCII Characters:

    unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
    

    RFC 2732 does define a syntax for IPv6 address using [ and ], but that's not within the query portion of a URI.