Search code examples
regexxmlxsdxsd-1.1

Regex in xsd assertion limited to beginning of element value


I have created assertions in xsd schema 1.1 that contain regular expressions. The expressions are supposed to exclude roman numbers and numbers that have a period and space after them from beginning of value of the element. From what I have read, I don't need to anchor the regular expression in xsd schema b/c it should already apply to beginning (I may have misunderstood that). I am not able to limit the regular expressions to the beginning.

XSD:

   <xs:element name="node123">
         <xs:simpleType>
                <xs:restriction base="xs:string">
                       <xs:assertion test="not(matches($value, '[\d].*\.\s.|[I].*\.\s.*|[V].*\.\s.*|[X].*\.\s.*|[L].*\.\s.*|[C].*\.\s.*'))"/>
                       <xs:assertion test="not(starts-with($value, '-'))"/>
                       <xs:assertion test="not(starts-with($value, '–'))"/>
                       <xs:assertion test="not(starts-with($value, '—'))"/>
                </xs:restriction>
         </xs:simpleType>
   </xs:element>

False positives are:

Mismash of Fid. R. Crim. Z

Shipped C. O. D

I can't use starts-with with the number expressions b/c that doesn't work at all. However, when I use starts-with with the other expressions, it doesn't apply to the whole element value.

Is there a way to limit the expressions to just the first words or start of the element?


Solution

  • Notes:

    1. XSD regular expressions in xsd:pattern facets are implicitly anchored at start (^) and end ($).

    2. XPath regular expressions, which are utilized in xsd:assertion are not implicitly anchored.

    Given the above, the regex provided by @WiktorStribiżew (along with his advice to try adding ^) is a reasonable approximation to your goal of excluding strings that look like Arabic or Romain numbers:

      <xs:element name="node123">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:assertion test="not(matches($value, '^([-–—]|[0-9IVXLC]+\.\s)'))"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>